GraphX question about graph traversal

2014-08-20 Thread Cesar Arevalo
Hi All:

I have a question about how to do the following operation in GraphX.
Suppose I have a graph with the following vertices and scores on the edges:


(V1 {type:B})-(V2 {type:A})--(V3 {type:A})-(V4
{type:B})
100   10100


I would like to get the type B vertices that are connected through type A
vertices where the edges have a score greater than 5. So, from the example
above I would like to get V1 and V4.

I am looking for ideas, not necessarily a solution. I was thinking of using
the pregel API, so I will continue looking into that. Anyway, I look
forward to a response.

Best,
-- 
Cesar Arevalo
Software Engineer ❘ Zephyr Health
450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
https://twitter.com/zephyrhealth
o: +1 415-529-7649 ❘ f: +1 415-520-9288
http://www.zephyrhealth.com


Re: GraphX question about graph traversal

2014-08-20 Thread Cesar Arevalo
Hey, thanks for your response.

And I had seen the triplets, but I'm not quite sure how the triplets would
get me that V1 is connected to V4. Maybe I need to spend more time
understanding it, I guess.

-Cesar



On Wed, Aug 20, 2014 at 10:56 AM, glxc r.ryan.mcc...@gmail.com wrote:

 I don't know if Pregel would be necessary since it's not iterative

 You could filter the graph by looking at edge triplets, and testing if
 source =B, dest =A, and edge value  5



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-question-about-graph-traversal-tp12491p12494.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
Cesar Arevalo
Software Engineer ❘ Zephyr Health
450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
https://twitter.com/zephyrhealth
o: +1 415-529-7649 ❘ f: +1 415-520-9288
http://www.zephyrhealth.com


Re: GraphX question about graph traversal

2014-08-20 Thread Cesar Arevalo
Hi Ankur, thank you for your response. I already looked at the sample code
you sent. And I think the modification you are referring to is on the
tryMatch function of the PartialMatch class. I noticed you have a case in
there that checks for a pattern match, and I think that's the code I need
to modify.

I'll let you know how it goes.

-Cesar


On Wed, Aug 20, 2014 at 2:14 PM, Ankur Dave ankurd...@gmail.com wrote:

 At 2014-08-20 10:34:50 -0700, Cesar Arevalo ce...@zephyrhealthinc.com
 wrote:
  I would like to get the type B vertices that are connected through type A
  vertices where the edges have a score greater than 5. So, from the
 example
  above I would like to get V1 and V4.

 It sounds like you're trying to find paths in the graph that match a
 particular pattern. I wrote a prototype for doing that using the Pregel API
 [1, 2] in response to an earlier question on the mailing list [3]. That
 code won't solve your problem immediately, since it requires exact vertex
 and edge attribute matches rather than predicates like greater than 5,
 but it should be easy to modify it appropriately.

 Ankur

 [1]
 https://github.com/ankurdave/spark/blob/PatternMatching/graphx/src/main/scala/org/apache/spark/graphx/lib/PatternMatching.scala
 [2]
 https://github.com/ankurdave/spark/blob/PatternMatching/graphx/src/test/scala/org/apache/spark/graphx/lib/PatternMatchingSuite.scala
 [3]
 http://apache-spark-user-list.1001560.n3.nabble.com/Graphx-traversal-and-merge-interesting-edges-td8788.html




-- 
Cesar Arevalo
Software Engineer ❘ Zephyr Health
450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
https://twitter.com/zephyrhealth
o: +1 415-529-7649 ❘ f: +1 415-520-9288
http://www.zephyrhealth.com


Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

2014-08-19 Thread Cesar Arevalo
Thanks! Yeah, it may be related to that. I'll check out that pull request
that was sent and hopefully that fixes the issue. I'll let you know, after
fighting with this issue yesterday I had decided to just leave it on the
side and return to it after, so it may take me a while to get back to you.

-Cesar


On Tue, Aug 19, 2014 at 2:04 PM, Yin Huai huaiyin@gmail.com wrote:

 Seems https://issues.apache.org/jira/browse/SPARK-2846 is the jira
 tracking this issue.


 On Mon, Aug 18, 2014 at 6:26 PM, cesararevalo ce...@zephyrhealthinc.com
 wrote:

 Thanks, Zhan for the follow up.

 But, do you know how I am supposed to set that table name on the jobConf?
 I
 don't have access to that object from my client driver?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-when-connecting-from-Spark-to-a-Hive-table-backed-by-HBase-tp12284p12331.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





-- 
Cesar Arevalo
Software Engineer ❘ Zephyr Health
450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
https://twitter.com/zephyrhealth
o: +1 415-529-7649 ❘ f: +1 415-520-9288
http://www.zephyrhealth.com


NullPointerException when connecting from Spark to a Hive table backed by HBase

2014-08-18 Thread Cesar Arevalo
Hello:

I am trying to setup Spark to connect to a Hive table which is backed by
HBase, but I am running into the following NullPointerException:

scala val hiveCount = hiveContext.sql(select count(*) from
dataset_records).collect().head.getLong(0)
14/08/18 06:34:29 INFO ParseDriver: Parsing command: select count(*) from
dataset_records
14/08/18 06:34:29 INFO ParseDriver: Parse Completed
14/08/18 06:34:29 INFO HiveMetaStore: 0: get_table : db=default
tbl=dataset_records
14/08/18 06:34:29 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table :
db=default tbl=dataset_records
14/08/18 06:34:30 INFO MemoryStore: ensureFreeSpace(160296) called with
curMem=0, maxMem=280248975
14/08/18 06:34:30 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 156.5 KB, free 267.1 MB)
14/08/18 06:34:30 INFO SparkContext: Starting job: collect at
SparkPlan.scala:85
14/08/18 06:34:31 WARN DAGScheduler: Creating new stage failed due to
exception - job: 0
java.lang.NullPointerException
at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:502)
at
org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:418)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)




This is happening from the master on spark, I am running hbase version
hbase-0.98.4-hadoop1 and hive version 0.13.1. And here is how I am running
the spark shell:

bin/spark-shell --driver-class-path
/opt/hive/latest/lib/hive-hbase-handler-0.13.1.jar:/opt/hive/latest/lib/zookeeper-3.4.5.jar:/opt/spark-poc/lib_managed/jars/com.google.guava/guava/guava-14.0.1.jar:/opt/hbase/latest/lib/hbase-common-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-server-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-client-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-protocol-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/htrace-core-2.04.jar:/opt/hbase/latest/lib/netty-3.6.6.Final.jar:/opt/hbase/latest/lib/hbase-hadoop-compat-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-client/hbase-client-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-common/hbase-common-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-server/hbase-server-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-prefix-tree/hbase-prefix-tree-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-protocol/hbase-protocol-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/opt/spark-poc/lib_managed/jars/org.cloudera.htrace/htrace-core/htrace-core-2.04.jar:/opt/spark/sql/hive/target/spark-hive_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-common/hive-common-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/hive-exec-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libthrift/libthrift-0.9.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-shims/hive-shims-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-metastore/hive-metastore-0.12.0.jar:/opt/spark/sql/catalyst/target/spark-catalyst_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.antlr/antlr-runtime/antlr-runtime-3.4.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libfb303/libfb303-0.9.0.jar:/opt/spark-poc/lib_managed/jars/javax.jdo/jdo-api/jdo-api-3.0.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-api-jdo/datanucleus-api-jdo-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-core/datanucleus-core-3.2.2.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-rdbms/datanucleus-rdbms-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.apache.derby/derby/derby-10.4.2.0.jar:/opt/spark-poc/sbt/ivy/cache/org.apache.hive/hive-hbase-handler/jars/hive-hbase-handler-0.13.1.jar:/opt/spark-poc/lib_managed/jars/com.typesafe/scalalogging-slf4j_2.10/scalalogging-slf4j_2.10-1.0.1.jar:/opt/spark-poc/lib_managed/bundles/com.jolbox/bonecp/bonecp-0.7.1.RELEASE.jar:/opt/spark-poc/sbt/ivy/cache/com.datastax.cassandra/cassandra-driver-core/bundles/cassandra-driver-core-2.0.4.jar:/opt/spark-poc/lib_managed/jars/org.json/json/json-20090211.jar



Can anybody help me?

Best,
-- 
Cesar Arevalo
Software Engineer ❘ Zephyr Health
450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
https://twitter.com/zephyrhealth
o: +1 415-529-7649 ❘ f: +1 415

Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

2014-08-18 Thread Cesar Arevalo
Nope, it is NOT null. Check this out:

scala hiveContext == null
res2: Boolean = false


And thanks for sending that link, but I had already looked at it. Any other
ideas?

I looked through some of the relevant Spark Hive code and I'm starting to
think this may be a bug.

-Cesar



On Mon, Aug 18, 2014 at 12:00 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Looks like your hiveContext is null. Have a look at this documentation.
 https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

 Thanks
 Best Regards


 On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo ce...@zephyrhealthinc.com
  wrote:

 Hello:

 I am trying to setup Spark to connect to a Hive table which is backed by
 HBase, but I am running into the following NullPointerException:

 scala val hiveCount = hiveContext.sql(select count(*) from
 dataset_records).collect().head.getLong(0)
 14/08/18 06:34:29 INFO ParseDriver: Parsing command: select count(*) from
 dataset_records
 14/08/18 06:34:29 INFO ParseDriver: Parse Completed
 14/08/18 06:34:29 INFO HiveMetaStore: 0: get_table : db=default
 tbl=dataset_records
 14/08/18 06:34:29 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table
 : db=default tbl=dataset_records
 14/08/18 06:34:30 INFO MemoryStore: ensureFreeSpace(160296) called with
 curMem=0, maxMem=280248975
 14/08/18 06:34:30 INFO MemoryStore: Block broadcast_0 stored as values in
 memory (estimated size 156.5 KB, free 267.1 MB)
 14/08/18 06:34:30 INFO SparkContext: Starting job: collect at
 SparkPlan.scala:85
 14/08/18 06:34:31 WARN DAGScheduler: Creating new stage failed due to
 exception - job: 0
 java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:502)
  at
 org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:418)
 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
  at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
  at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 at scala.Option.getOrElse(Option.scala:120)




 This is happening from the master on spark, I am running hbase version
 hbase-0.98.4-hadoop1 and hive version 0.13.1. And here is how I am running
 the spark shell:

 bin/spark-shell --driver-class-path
 /opt/hive/latest/lib/hive-hbase-handler-0.13.1.jar:/opt/hive/latest/lib/zookeeper-3.4.5.jar:/opt/spark-poc/lib_managed/jars/com.google.guava/guava/guava-14.0.1.jar:/opt/hbase/latest/lib/hbase-common-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-server-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-client-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-protocol-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/htrace-core-2.04.jar:/opt/hbase/latest/lib/netty-3.6.6.Final.jar:/opt/hbase/latest/lib/hbase-hadoop-compat-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-client/hbase-client-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-common/hbase-common-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-server/hbase-server-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-prefix-tree/hbase-prefix-tree-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-protocol/hbase-protocol-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/opt/spark-poc/lib_managed/jars/org.cloudera.htrace/htrace-core/htrace-core-2.04.jar:/opt/spark/sql/hive/target/spark-hive_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-common/hive-common-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/hive-exec-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libthrift/libthrift-0.9.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-shims/hive-shims-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-metastore/hive-metastore-0.12.0.jar:/opt/spark/sql/catalyst/target/spark-catalyst_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.antlr/antlr-runtime/antlr-runtime-3.4.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libfb303/libfb303-0.9.0.jar:/opt/spark-poc/lib_managed/jars/javax.jdo/jdo-api/jdo-api-3.0.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-api-jdo/datanucleus-api-jdo-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-core/datanucleus-core-3.2.2.jar:/opt/spark-poc/lib_managed/jars/org.datanucleus/datanucleus-rdbms/datanucleus-rdbms-3.2.1.jar:/opt/spark-poc/lib_managed/jars/org.apache.derby/derby/derby-10.4.2.0.jar:/opt/spark-poc/sbt/ivy/cache/org.apache.hive/hive

Re: NullPointerException when connecting from Spark to a Hive table backed by HBase

2014-08-18 Thread Cesar Arevalo
I removed the JAR that you suggested but now I get another error when I try
to create the HiveContext. Here is the error:

scala val hiveContext = new HiveContext(sc)
error: bad symbolic reference. A signature in HiveContext.class refers to
term ql
in package org.apache.hadoop.hive which is not available.
It may be completely missing from the current classpath,
ommitted more stacktrace for readability...


Best,
-Cesar


On Mon, Aug 18, 2014 at 12:47 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Then definitely its a jar conflict. Can you try removing this jar from the
 class path /opt/spark-poc/lib_managed/jars/org.
 spark-project.hive/hive-exec/hive-exec-0.12.0.jar

 Thanks
 Best Regards


 On Mon, Aug 18, 2014 at 12:45 PM, Cesar Arevalo ce...@zephyrhealthinc.com
  wrote:

 Nope, it is NOT null. Check this out:

 scala hiveContext == null
 res2: Boolean = false


 And thanks for sending that link, but I had already looked at it. Any
 other ideas?

 I looked through some of the relevant Spark Hive code and I'm starting to
 think this may be a bug.

 -Cesar



 On Mon, Aug 18, 2014 at 12:00 AM, Akhil Das ak...@sigmoidanalytics.com
 wrote:

 Looks like your hiveContext is null. Have a look at this documentation.
 https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

 Thanks
 Best Regards


 On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo 
 ce...@zephyrhealthinc.com wrote:

 Hello:

 I am trying to setup Spark to connect to a Hive table which is backed
 by HBase, but I am running into the following NullPointerException:

 scala val hiveCount = hiveContext.sql(select count(*) from
 dataset_records).collect().head.getLong(0)
 14/08/18 06:34:29 INFO ParseDriver: Parsing command: select count(*)
 from dataset_records
 14/08/18 06:34:29 INFO ParseDriver: Parse Completed
 14/08/18 06:34:29 INFO HiveMetaStore: 0: get_table : db=default
 tbl=dataset_records
 14/08/18 06:34:29 INFO audit: ugi=root ip=unknown-ip-addr cmd=get_table
 : db=default tbl=dataset_records
 14/08/18 06:34:30 INFO MemoryStore: ensureFreeSpace(160296) called with
 curMem=0, maxMem=280248975
 14/08/18 06:34:30 INFO MemoryStore: Block broadcast_0 stored as values
 in memory (estimated size 156.5 KB, free 267.1 MB)
 14/08/18 06:34:30 INFO SparkContext: Starting job: collect at
 SparkPlan.scala:85
 14/08/18 06:34:31 WARN DAGScheduler: Creating new stage failed due to
 exception - job: 0
 java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:502)
  at
 org.apache.hadoop.hive.hbase.HiveHBaseTableInputFormat.getSplits(HiveHBaseTableInputFormat.java:418)
 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
  at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
  at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 at scala.Option.getOrElse(Option.scala:120)




 This is happening from the master on spark, I am running hbase version
 hbase-0.98.4-hadoop1 and hive version 0.13.1. And here is how I am running
 the spark shell:

 bin/spark-shell --driver-class-path
 /opt/hive/latest/lib/hive-hbase-handler-0.13.1.jar:/opt/hive/latest/lib/zookeeper-3.4.5.jar:/opt/spark-poc/lib_managed/jars/com.google.guava/guava/guava-14.0.1.jar:/opt/hbase/latest/lib/hbase-common-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-server-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-client-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/hbase-protocol-0.98.4-hadoop1.jar:/opt/hbase/latest/lib/htrace-core-2.04.jar:/opt/hbase/latest/lib/netty-3.6.6.Final.jar:/opt/hbase/latest/lib/hbase-hadoop-compat-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-client/hbase-client-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-common/hbase-common-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-server/hbase-server-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-prefix-tree/hbase-prefix-tree-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/jars/org.apache.hbase/hbase-protocol/hbase-protocol-0.98.4-hadoop1.jar:/opt/spark-poc/lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar:/opt/spark-poc/lib_managed/jars/org.cloudera.htrace/htrace-core/htrace-core-2.04.jar:/opt/spark/sql/hive/target/spark-hive_2.10-1.1.0-SNAPSHOT.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-common/hive-common-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/hive-exec-0.12.0.jar:/opt/spark-poc/lib_managed/jars/org.apache.thrift/libthrift/libthrift-0.9.0.jar:/opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-shims/hive-shims-0.12.0.jar

Re: Broadcast variable in Spark Java application

2014-07-07 Thread Cesar Arevalo
Hi Praveen:

It may be easier for other people to help you if you provide more details about 
what you are doing. It may be worthwhile to also mention which spark version 
you are using. And if you can share the code which doesn't work for you, that 
may also give others more clues as to what you are doing wrong.

I've found that following the spark programming guide online usually gives me 
enough information, but I guess you've already tried that.

Best,
-Cesar

 On Jul 7, 2014, at 12:41 AM, Praveen R prav...@sigmoidanalytics.com wrote:
 
 I need a variable to be broadcasted from driver to executor processes in my 
 spark java application. I tried using spark broadcast mechanism to achieve, 
 but no luck there. 
 
 Could someone help me doing this, share some code probably ?
 
 Thanks,
 Praveen R


Re: Spark 1.0 failed on HDP 2.0 with absurd exception

2014-07-05 Thread Cesar Arevalo
From looking at the exception message that was returned, I would try the
following command for running the application:

./bin/spark-submit --class test.etl.RunETL --master yarn-cluster
--num-workers 14 --driver-memory 3200m --worker-memory 3g --worker-cores 2
--jar my-etl-1.0-SNAPSHOT-hadoop2.2.0.jar


I didn't try this, so it may not work.

Best,
-Cesar



On Sat, Jul 5, 2014 at 2:48 AM, Konstantin Kudryavtsev 
kudryavtsev.konstan...@gmail.com wrote:

 Hi all,

 I have cluster with HDP 2.0. I built Spark 1.0 on edge node and trying to
 run with a command
 ./bin/spark-submit --class test.etl.RunETL --master yarn-cluster
 --num-executors 14 --driver-memory 3200m --executor-memory 3g
 --executor-cores 2 my-etl-1.0-SNAPSHOT-hadoop2.2.0.jar

 in result I got failed YARN application with following stack trace

 Application application_1404481778533_0068 failed 3 times due to AM
 Container for appattempt_1404481778533_0068_03 exited with exitCode: 1
 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
 at
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
  .Failing this attempt.. Failing the application

 Log Type: stderr

 Log Length: 686

 Unknown/unsupported param List(--executor-memory, 3072, --executor-cores, 2, 
 --num-executors, 14)
 Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options]
 Options:
   --jar JAR_PATH   Path to your application's JAR file (required)
   --class CLASS_NAME   Name of your application's main class (required)
   --args ARGS  Arguments to be passed to your application's main 
 class.
Mutliple invocations are possible, each will be passed 
 in order.
   --num-workers NUMNumber of workers to start (Default: 2)
   --worker-cores NUM   Number of cores for the workers (Default: 1)
   --worker-memory MEM  Memory per Worker (e.g. 1000M, 2G) (Default: 1G)


 Seems like the old spark notation any ideas?

 Thank you,
 Konstantin Kudryavtsev




-- 
Cesar Arevalo
Software Engineer ❘ Zephyr Health
450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
https://twitter.com/zephyrhealth
o: +1 415-529-7649 ❘ f: +1 415-520-9288
http://www.zephyrhealth.com


Re: Spark Streaming on top of Cassandra?

2014-07-04 Thread Cesar Arevalo
Hi Zarzyk:

If I were you, just to start, I would look at the following:

https://groups.google.com/forum/#!topic/spark-users/htQQA3KidEQ
http://www.slideshare.net/planetcassandra/south-bay-cassandrealtime-analytics-using-cassandra-spark-and-shark-at-ooyala
http://spark-summit.org/2014/talk/using-spark-streaming-for-high-velocity-analytics-on-cassandra

Best,
-Cesar


On Jul 4, 2014, at 12:33 AM, zarzyk k.zarzy...@gmail.com wrote:

 Hi,
 I bump this thread as I'm also interested in the answer. Can anyone help or
 point to the information on how to do Spark Streaming from/to Cassandra?
 
 Thanks!
 Zarzyk
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-on-top-of-Cassandra-tp1283p8778.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Re: Spark Streaming on top of Cassandra?

2014-07-04 Thread Cesar Arevalo
Hi Zarzyk:

If I were you, just to start, I would look at the following:

https://groups.google.com/forum/#!topic/spark-users/htQQA3KidEQ
http://www.slideshare.net/planetcassandra/south-bay-cassandrealtime-analytics-using-cassandra-spark-and-shark-at-ooyala
http://spark-summit.org/2014/talk/using-spark-streaming-for-high-velocity-analytics-on-cassandra

Best,
-Cesar


On Jul 4, 2014, at 12:33 AM, zarzyk k.zarzy...@gmail.com wrote:

 Hi,
 I bump this thread as I'm also interested in the answer. Can anyone help or
 point to the information on how to do Spark Streaming from/to Cassandra?
 
 Thanks!
 Zarzyk
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-on-top-of-Cassandra-tp1283p8778.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Anybody changed their mind about going to the Spark Summit 2014

2014-06-27 Thread Cesar Arevalo
Hi All:

I was wondering if anybody had bought a ticket for the upcoming Spark
Summit 2014 this coming week and had changed their mind about going.

Let me know, since it has sold out and I can't buy a ticket anymore, I
would be interested in buying it.

Best,
-- 
Cesar Arevalo
Software Engineer ❘ Zephyr Health
450 Mission Street, Suite #201 ❘ San Francisco, CA 94105
m: +1 415-571-7687 ❘ s: arevalocesar | t: @zephyrhealth
https://twitter.com/zephyrhealth
o: +1 415-529-7649 ❘ f: +1 415-520-9288
http://www.zephyrhealth.com