GraphX question about graph traversal
Hi All: I have a question about how to do the following operation in GraphX. Suppose I have a graph with the following vertices and scores on the edges: (V1 {type:B})-(V2 {type:A})--(V3 {type:A})-(V4 {type:B}) 100 10100 I would like to get the type B vertices that are connected through type A vertices where the edges have a score greater than 5. So, from the example above I would like to get V1 and V4. I am looking for ideas, not necessarily a solution. I was thinking of using the pregel API, so I will continue looking into that. Anyway, I look forward to a response. Best, -- Cesar Arevalo Software Engineer ❘ Zephyr Health 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105 m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth https://twitter.com/zephyrhealth o: +1 415-529-7649 ❘ f: +1 415-520-9288 http://www.zephyrhealth.com
Re: GraphX question about graph traversal
Hey, thanks for your response. And I had seen the triplets, but I'm not quite sure how the triplets would get me that V1 is connected to V4. Maybe I need to spend more time understanding it, I guess. -Cesar On Wed, Aug 20, 2014 at 10:56 AM, glxc r.ryan.mcc...@gmail.com wrote: I don't know if Pregel would be necessary since it's not iterative You could filter the graph by looking at edge triplets, and testing if source =B, dest =A, and edge value 5 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-question-about-graph-traversal-tp12491p12494.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Cesar Arevalo Software Engineer ❘ Zephyr Health 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105 m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth https://twitter.com/zephyrhealth o: +1 415-529-7649 ❘ f: +1 415-520-9288 http://www.zephyrhealth.com
Re: GraphX question about graph traversal
Hi Ankur, thank you for your response. I already looked at the sample code you sent. And I think the modification you are referring to is on the tryMatch function of the PartialMatch class. I noticed you have a case in there that checks for a pattern match, and I think that's the code I need to modify. I'll let you know how it goes. -Cesar On Wed, Aug 20, 2014 at 2:14 PM, Ankur Dave ankurd...@gmail.com wrote: At 2014-08-20 10:34:50 -0700, Cesar Arevalo ce...@zephyrhealthinc.com wrote: I would like to get the type B vertices that are connected through type A vertices where the edges have a score greater than 5. So, from the example above I would like to get V1 and V4. It sounds like you're trying to find paths in the graph that match a particular pattern. I wrote a prototype for doing that using the Pregel API [1, 2] in response to an earlier question on the mailing list [3]. That code won't solve your problem immediately, since it requires exact vertex and edge attribute matches rather than predicates like greater than 5, but it should be easy to modify it appropriately. Ankur [1] https://github.com/ankurdave/spark/blob/PatternMatching/graphx/src/main/scala/org/apache/spark/graphx/lib/PatternMatching.scala [2] https://github.com/ankurdave/spark/blob/PatternMatching/graphx/src/test/scala/org/apache/spark/graphx/lib/PatternMatchingSuite.scala [3] http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-traversal-and-merge-interesting-edges-td8788.html -- Cesar Arevalo Software Engineer ❘ Zephyr Health 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105 m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth https://twitter.com/zephyrhealth o: +1 415-529-7649 ❘ f: +1 415-520-9288 http://www.zephyrhealth.com
Re: NullPointerException when connecting from Spark to a Hive table backed by HBase
Thanks! Yeah, it may be related to that. I'll check out that pull request that was sent and hopefully that fixes the issue. I'll let you know, after fighting with this issue yesterday I had decided to just leave it on the side and return to it after, so it may take me a while to get back to you. -Cesar On Tue, Aug 19, 2014 at 2:04 PM, Yin Huai huaiyin@gmail.com wrote: Seems https://issues.apache.org/jira/browse/SPARK-2846 is the jira tracking this issue. On Mon, Aug 18, 2014 at 6:26 PM, cesararevalo ce...@zephyrhealthinc.com wrote: Thanks, Zhan for the follow up. But, do you know how I am supposed to set that table name on the jobConf? I don't have access to that object from my client driver? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-connecting-from-Spark-to-a-Hive-table-backed-by-HBase-tp12284p12331.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Cesar Arevalo Software Engineer ❘ Zephyr Health 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105 m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth https://twitter.com/zephyrhealth o: +1 415-529-7649 ❘ f: +1 415-520-9288 http://www.zephyrhealth.com
NullPointerException when connecting from Spark to a Hive table backed by HBase
Hello: I am trying to setup Spark to connect to a Hive table which is backed by HBase, but I am running into the following NullPointerException: scala val hiveCount = hiveContext.sql(select count(*) from dataset_records).collect().head.getLong(0) 14/08/18 06:34:29 INFO ParseDriver: Parsing command: select count(*) from dataset_records 14/08/18 06:34:29 INFO ParseDriver: Parse Completed 14/08/18 06:34:29 INFO HiveMetaStore: 0: get_table : db=default tbl=dataset_records 14/08/18 06:34:29 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=dataset_records 14/08/18 06:34:30 INFO MemoryStore: ensureFreeSpace(160296) called with curMem=0, maxMem=280248975 14/08/18 06:34:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 156.5 KB, free 267.1 MB) 14/08/18 06:34:30 INFO SparkContext: Starting job: collect at SparkPlan.scala:85 14/08/18 06:34:31 WARN DAGScheduler: Creating new stage failed due to exception - job: 0 java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:502) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:418) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) This is happening from the master on spark, I am running hbase version hbase-0.98.4-hadoop1 and hive version 0.13.1. And here is how I am running the spark shell: bin/spark-shell --driver-class-path /opt/hive/latest/lib/hive-hbase-handler-0.13.1.jar:/opt/hive/latest/lib/zookeeper-3.4.5.jar:/opt/spark-poc/lib_managed/jars/com.google.guava/guava/guava-14.0.1.jar:/opt/hbase/latest/lib/hbase-common-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-server-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-client-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-protocol-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/htrace-core-2.04.jar:/opt/hbase/latest/lib/netty-3.6.6.Final.jar:/opt/hbase/latest/lib/hbase-hadoop-compat-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-client/hbase-client-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-common/hbase-common-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-server/hbase-server-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-prefix-tree/hbase-prefix-tree-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-protocol/hbase-protocol-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/opt/spark-poc/lib_managed/jars/org.cloudera.htrace/htrace-core/htrace-core-2.04.jar:/opt/spark/sql/hive/target/spark-hive_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-common/hive-common-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/hive-exec-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libthrift/libthrift-0.9.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-shims/hive-shims-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-metastore/hive-metastore-0.12.0.jar:/opt/spark/sql/catalyst/target/spark-catalyst_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.antlr/antlr-runtime/antlr-runtime-3.4.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libfb303/libfb303-0.9.0.jar:/opt/spark-poc/lib_managed/jars/javax.jdo/jdo-api/jdo-api-3.0.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-api-jdo/datanucleus-api-jdo-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-core/datanucleus-core-3.2.2.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-rdbms/datanucleus-rdbms-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.apache.derby/derby/derby-10.4.2.0.jar:/opt/spark-poc/sbt/ivy/cache/org.apache.hive/hive-hbase-handler/jars/hive-hbase-handler-0.13.1.jar:/opt/spark-poc/lib_managed/jars/com.typesafe/scalalogging-slf4j_2.10/scalalogging-slf4j_2.10-1.0.1.jar:/opt/spark-poc/lib_managed/bundles/com.jolbox/bonecp/bonecp-0.7.1.RELEASE.jar:/opt/spark-poc/sbt/ivy/cache/com.datastax.cassandra/cassandra-driver-core/bundles/cassandra-driver-core-2.0.4.jar:/opt/spark-poc/lib_managed/jars/org.json/json/json-20090211.jar Can anybody help me? Best, -- Cesar Arevalo Software Engineer ❘ Zephyr Health 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105 m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth https://twitter.com/zephyrhealth o: +1 415-529-7649 ❘ f: +1 415
Re: NullPointerException when connecting from Spark to a Hive table backed by HBase
Nope, it is NOT null. Check this out: scala hiveContext == null res2: Boolean = false And thanks for sending that link, but I had already looked at it. Any other ideas? I looked through some of the relevant Spark Hive code and I'm starting to think this may be a bug. -Cesar On Mon, Aug 18, 2014 at 12:00 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Looks like your hiveContext is null. Have a look at this documentation. https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables Thanks Best Regards On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo ce...@zephyrhealthinc.com wrote: Hello: I am trying to setup Spark to connect to a Hive table which is backed by HBase, but I am running into the following NullPointerException: scala val hiveCount = hiveContext.sql(select count(*) from dataset_records).collect().head.getLong(0) 14/08/18 06:34:29 INFO ParseDriver: Parsing command: select count(*) from dataset_records 14/08/18 06:34:29 INFO ParseDriver: Parse Completed 14/08/18 06:34:29 INFO HiveMetaStore: 0: get_table : db=default tbl=dataset_records 14/08/18 06:34:29 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=dataset_records 14/08/18 06:34:30 INFO MemoryStore: ensureFreeSpace(160296) called with curMem=0, maxMem=280248975 14/08/18 06:34:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 156.5 KB, free 267.1 MB) 14/08/18 06:34:30 INFO SparkContext: Starting job: collect at SparkPlan.scala:85 14/08/18 06:34:31 WARN DAGScheduler: Creating new stage failed due to exception - job: 0 java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:502) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:418) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) This is happening from the master on spark, I am running hbase version hbase-0.98.4-hadoop1 and hive version 0.13.1. And here is how I am running the spark shell: bin/spark-shell --driver-class-path /opt/hive/latest/lib/hive-hbase-handler-0.13.1.jar:/opt/hive/latest/lib/zookeeper-3.4.5.jar:/opt/spark-poc/lib_managed/jars/com.google.guava/guava/guava-14.0.1.jar:/opt/hbase/latest/lib/hbase-common-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-server-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-client-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-protocol-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/htrace-core-2.04.jar:/opt/hbase/latest/lib/netty-3.6.6.Final.jar:/opt/hbase/latest/lib/hbase-hadoop-compat-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-client/hbase-client-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-common/hbase-common-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-server/hbase-server-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-prefix-tree/hbase-prefix-tree-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-protocol/hbase-protocol-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/opt/spark-poc/lib_managed/jars/org.cloudera.htrace/htrace-core/htrace-core-2.04.jar:/opt/spark/sql/hive/target/spark-hive_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-common/hive-common-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/hive-exec-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libthrift/libthrift-0.9.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-shims/hive-shims-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-metastore/hive-metastore-0.12.0.jar:/opt/spark/sql/catalyst/target/spark-catalyst_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.antlr/antlr-runtime/antlr-runtime-3.4.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libfb303/libfb303-0.9.0.jar:/opt/spark-poc/lib_managed/jars/javax.jdo/jdo-api/jdo-api-3.0.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-api-jdo/datanucleus-api-jdo-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-core/datanucleus-core-3.2.2.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-rdbms/datanucleus-rdbms-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.apache.derby/derby/derby-10.4.2.0.jar:/opt/spark-poc/sbt/ivy/cache/org.apache.hive/hive
Re: NullPointerException when connecting from Spark to a Hive table backed by HBase
I removed the JAR that you suggested but now I get another error when I try to create the HiveContext. Here is the error: scala val hiveContext = new HiveContext(sc) error: bad symbolic reference. A signature in HiveContext.class refers to term ql in package org.apache.hadoop.hive which is not available. It may be completely missing from the current classpath, ommitted more stacktrace for readability... Best, -Cesar On Mon, Aug 18, 2014 at 12:47 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Then definitely its a jar conflict. Can you try removing this jar from the class path /opt/spark-poc/lib_managed/jars/org. spark-project.hive/hive-exec/hive-exec-0.12.0.jar Thanks Best Regards On Mon, Aug 18, 2014 at 12:45 PM, Cesar Arevalo ce...@zephyrhealthinc.com wrote: Nope, it is NOT null. Check this out: scala hiveContext == null res2: Boolean = false And thanks for sending that link, but I had already looked at it. Any other ideas? I looked through some of the relevant Spark Hive code and I'm starting to think this may be a bug. -Cesar On Mon, Aug 18, 2014 at 12:00 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Looks like your hiveContext is null. Have a look at this documentation. https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables Thanks Best Regards On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo ce...@zephyrhealthinc.com wrote: Hello: I am trying to setup Spark to connect to a Hive table which is backed by HBase, but I am running into the following NullPointerException: scala val hiveCount = hiveContext.sql(select count(*) from dataset_records).collect().head.getLong(0) 14/08/18 06:34:29 INFO ParseDriver: Parsing command: select count(*) from dataset_records 14/08/18 06:34:29 INFO ParseDriver: Parse Completed 14/08/18 06:34:29 INFO HiveMetaStore: 0: get_table : db=default tbl=dataset_records 14/08/18 06:34:29 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=dataset_records 14/08/18 06:34:30 INFO MemoryStore: ensureFreeSpace(160296) called with curMem=0, maxMem=280248975 14/08/18 06:34:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 156.5 KB, free 267.1 MB) 14/08/18 06:34:30 INFO SparkContext: Starting job: collect at SparkPlan.scala:85 14/08/18 06:34:31 WARN DAGScheduler: Creating new stage failed due to exception - job: 0 java.lang.NullPointerException at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:502) at org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:418) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) This is happening from the master on spark, I am running hbase version hbase-0.98.4-hadoop1 and hive version 0.13.1. And here is how I am running the spark shell: bin/spark-shell --driver-class-path /opt/hive/latest/lib/hive-hbase-handler-0.13.1.jar:/opt/hive/latest/lib/zookeeper-3.4.5.jar:/opt/spark-poc/lib_managed/jars/com.google.guava/guava/guava-14.0.1.jar:/opt/hbase/latest/lib/hbase-common-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-server-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-client-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-protocol-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/htrace-core-2.04.jar:/opt/hbase/latest/lib/netty-3.6.6.Final.jar:/opt/hbase/latest/lib/hbase-hadoop-compat-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-client/hbase-client-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-common/hbase-common-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-server/hbase-server-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-prefix-tree/hbase-prefix-tree-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-protocol/hbase-protocol-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/opt/spark-poc/lib_managed/jars/org.cloudera.htrace/htrace-core/htrace-core-2.04.jar:/opt/spark/sql/hive/target/spark-hive_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-common/hive-common-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/hive-exec-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libthrift/libthrift-0.9.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-shims/hive-shims-0.12.0.jar
Re: Broadcast variable in Spark Java application
Hi Praveen: It may be easier for other people to help you if you provide more details about what you are doing. It may be worthwhile to also mention which spark version you are using. And if you can share the code which doesn't work for you, that may also give others more clues as to what you are doing wrong. I've found that following the spark programming guide online usually gives me enough information, but I guess you've already tried that. Best, -Cesar On Jul 7, 2014, at 12:41 AM, Praveen R prav...@sigmoidanalytics.com wrote: I need a variable to be broadcasted from driver to executor processes in my spark java application. I tried using spark broadcast mechanism to achieve, but no luck there. Could someone help me doing this, share some code probably ? Thanks, Praveen R
Re: Spark 1.0 failed on HDP 2.0 with absurd exception
From looking at the exception message that was returned, I would try the following command for running the application: ./bin/spark-submit --class test.etl.RunETL --master yarn-cluster --num-workers 14 --driver-memory 3200m --worker-memory 3g --worker-cores 2 --jar my-etl-1.0-SNAPSHOT-hadoop2.2.0.jar I didn't try this, so it may not work. Best, -Cesar On Sat, Jul 5, 2014 at 2:48 AM, Konstantin Kudryavtsev kudryavtsev.konstan...@gmail.com wrote: Hi all, I have cluster with HDP 2.0. I built Spark 1.0 on edge node and trying to run with a command ./bin/spark-submit --class test.etl.RunETL --master yarn-cluster --num-executors 14 --driver-memory 3200m --executor-memory 3g --executor-cores 2 my-etl-1.0-SNAPSHOT-hadoop2.2.0.jar in result I got failed YARN application with following stack trace Application application_1404481778533_0068 failed 3 times due to AM Container for appattempt_1404481778533_0068_03 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) .Failing this attempt.. Failing the application Log Type: stderr Log Length: 686 Unknown/unsupported param List(--executor-memory, 3072, --executor-cores, 2, --num-executors, 14) Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options] Options: --jar JAR_PATH Path to your application's JAR file (required) --class CLASS_NAME Name of your application's main class (required) --args ARGS Arguments to be passed to your application's main class. Mutliple invocations are possible, each will be passed in order. --num-workers NUMNumber of workers to start (Default: 2) --worker-cores NUM Number of cores for the workers (Default: 1) --worker-memory MEM Memory per Worker (e.g. 1000M, 2G) (Default: 1G) Seems like the old spark notation any ideas? Thank you, Konstantin Kudryavtsev -- Cesar Arevalo Software Engineer ❘ Zephyr Health 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105 m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth https://twitter.com/zephyrhealth o: +1 415-529-7649 ❘ f: +1 415-520-9288 http://www.zephyrhealth.com
Re: Spark Streaming on top of Cassandra?
Hi Zarzyk: If I were you, just to start, I would look at the following: https://groups.google.com/forum/#!topic/spark-users/htQQA3KidEQ http://www.slideshare.net/planetcassandra/south-bay-cassandrealtime-analytics-using-cassandra-spark-and-shark-at-ooyala http://spark-summit.org/2014/talk/using-spark-streaming-for-high-velocity-analytics-on-cassandra Best, -Cesar On Jul 4, 2014, at 12:33 AM, zarzyk k.zarzy...@gmail.com wrote: Hi, I bump this thread as I'm also interested in the answer. Can anyone help or point to the information on how to do Spark Streaming from/to Cassandra? Thanks! Zarzyk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-on-top-of-Cassandra-tp1283p8778.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Spark Streaming on top of Cassandra?
Hi Zarzyk: If I were you, just to start, I would look at the following: https://groups.google.com/forum/#!topic/spark-users/htQQA3KidEQ http://www.slideshare.net/planetcassandra/south-bay-cassandrealtime-analytics-using-cassandra-spark-and-shark-at-ooyala http://spark-summit.org/2014/talk/using-spark-streaming-for-high-velocity-analytics-on-cassandra Best, -Cesar On Jul 4, 2014, at 12:33 AM, zarzyk k.zarzy...@gmail.com wrote: Hi, I bump this thread as I'm also interested in the answer. Can anyone help or point to the information on how to do Spark Streaming from/to Cassandra? Thanks! Zarzyk -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-on-top-of-Cassandra-tp1283p8778.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Anybody changed their mind about going to the Spark Summit 2014
Hi All: I was wondering if anybody had bought a ticket for the upcoming Spark Summit 2014 this coming week and had changed their mind about going. Let me know, since it has sold out and I can't buy a ticket anymore, I would be interested in buying it. Best, -- Cesar Arevalo Software Engineer ❘ Zephyr Health 450 Mission Street, Suite #201 ❘ San Francisco, CA 94105 m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth https://twitter.com/zephyrhealth o: +1 415-529-7649 ❘ f: +1 415-520-9288 http://www.zephyrhealth.com