Re: calling persist would cause java.util.NoSuchElementException: key not found:

2015-10-02 Thread Shixiong Zhu
Do you have the full stack trace? Could you check if it's same as https://issues.apache.org/jira/browse/SPARK-10422 Best Regards, Shixiong Zhu 2015-10-01 17:05 GMT+08:00 Eyad Sibai <eyad.alsi...@gmail.com>: > Hi > > I am trying to call .persist() on a dataframe but once I e

Re: Monitoring tools for spark streaming

2015-09-28 Thread Shixiong Zhu
Which version are you using? Could you take a look at the new Streaming UI in 1.4.0? Best Regards, Shixiong Zhu 2015-09-29 7:52 GMT+08:00 Siva <sbhavan...@gmail.com>: > Hi, > > Could someone recommend the monitoring tools for spark streaming? > > By extending Streamin

Re: Spark streaming job filling a lot of data in local spark nodes

2015-09-28 Thread Shixiong Zhu
enough space. Best Regards, Shixiong Zhu 2015-09-29 1:04 GMT+08:00 swetha <swethakasire...@gmail.com>: > > Hi, > > I see a lot of data getting filled locally as shown below from my streaming > job. I have my checkpoint set to hdfs. But, I still see the following data > fi

Re: spark.streaming.concurrentJobs

2015-09-28 Thread Shixiong Zhu
"count" Spark jobs will run in parallel. Moreover, "spark.streaming.concurrentJobs" is an internal configuration and it may be changed in future. Best Regards, Shixiong Zhu 2015-09-26 3:34 GMT+08:00 Atul Kulkarni <atulskulka...@gmail.com>: > Can someone please he

Re: Join two dataframe - Timeout after 5 minutes

2015-09-24 Thread Shixiong Zhu
You can change "spark.sql.broadcastTimeout" to increase the timeout. The default value is 300 seconds. Best Regards, Shixiong Zhu 2015-09-24 15:16 GMT+08:00 Eyad Sibai <eyad.alsi...@gmail.com>: > I am trying to join two tables using dataframes using python 3.4 and I am >

Re: Hbase Spark streaming issue.

2015-09-24 Thread Shixiong Zhu
Looks like you have an incompatible hbase-default.xml in some place. You can use the following code to find the location of "hbase-default.xml" println(Thread.currentThread().getContextClassLoader().getResource("hbase-default.xml")) Best Regards, Shixiong Zhu 2015-09-21

Fwd: Spark streaming DStream state on worker

2015-09-24 Thread Shixiong Zhu
. RDD.compute: this will run in the executor and the location is not guaranteed. E.g., DStream.foreachRDD(rdd => rdd.foreach { v => println(v) }) "println(v)" is called in the executor. Best Regards, Shixiong Zhu 2015-09-17 3:47 GMT+08:00 Renyi Xiong <renyixio...@gmail.com>: &

Re: JobScheduler: Error generating jobs for time for custom InputDStream

2015-09-24 Thread Shixiong Zhu
Looks like you returns a "Some(null)" in "compute". If you don't want to create a RDD, it should return None. If you want to return an empty RDD, it should return "Some(sc.emptyRDD)". Best Regards, Shixiong Zhu 2015-09-15 2:51 GMT+08:00 Juan Rodríguez Hortalá <

Re: Local Mode: Executor thread leak?

2015-12-08 Thread Shixiong Zhu
Could you send a PR to fix it? Thanks! Best Regards, Shixiong Zhu 2015-12-08 13:31 GMT-08:00 Richard Marscher <rmarsc...@localytics.com>: > Alright I was able to work through the problem. > > So the owning thread was one from the executor task launch worker, which > at least

Re: Local Mode: Executor thread leak?

2015-12-07 Thread Shixiong Zhu
Which version are you using? Could you post these thread names here? Best Regards, Shixiong Zhu 2015-12-07 14:30 GMT-08:00 Richard Marscher <rmarsc...@localytics.com>: > Hi, > > I've been running benchmarks against Spark in local mode in a long running > process. I'm seeing th

Re: Spark streaming: java.lang.ClassCastException: org.apache.spark.util.SerializableConfiguration ... on restart from checkpoint

2015-12-17 Thread Shixiong Zhu
Best Regards, Shixiong Zhu 2015-12-17 4:39 GMT-08:00 Bartłomiej Alberski <albers...@gmail.com>: > I prepared simple example helping in reproducing problem: > > https://github.com/alberskib/spark-streaming-broadcast-issue > > I think that in that way it will be easier for you

Re: Use of rdd.zipWithUniqueId() in DStream

2015-12-14 Thread Shixiong Zhu
It doesn't guarantee that. E.g., scala> sc.parallelize(Seq(1.0, 2.0, 3.0, 4.0), 2).filter(_ > 2.0).zipWithUniqueId().collect().foreach(println) (3.0,1) (4.0,3) It only guarantees "unique". Best Regards, Shixiong Zhu 2015-12-13 10:18 GMT-08:00 Sourav Mazumder <sourav.m

Re: Spark Streaming Application is Stuck Under Heavy Load Due to DeadLock

2016-01-04 Thread Shixiong Zhu
Hye Rachana, could you provide the full jstack outputs? Maybe it's same as https://issues.apache.org/jira/browse/SPARK-11104 Best Regards, Shixiong Zhu 2016-01-04 12:56 GMT-08:00 Rachana Srivastava < rachana.srivast...@markmonitor.com>: > Hello All, > > > > I am running my

Re: spark-submit for dependent jars

2015-12-21 Thread Shixiong Zhu
Looks you need to add an "driver" option to your codes, such as sqlContext.read.format("jdbc").options( Map("url" -> "jdbc:oracle:thin:@:1521:xxx", "driver" -> "oracle.jdbc.driver.OracleDriver", "dbtable&q

Re: val listRDD =ssc.socketTextStream(localhost,9999) on Yarn

2015-12-22 Thread Shixiong Zhu
Just replace `localhost` with a host name that can be accessed by Yarn containers. Best Regards, Shixiong Zhu 2015-12-22 0:11 GMT-08:00 prasadreddy <alle.re...@gmail.com>: > How do we achieve this on yarn-cluster mode > > Please advice. > > Thanks > Prasad > &

Re: seriazable error in apache spark job

2015-12-18 Thread Shixiong Zhu
Looks you have a reference to some Akka class. Could you post your codes? Best Regards, Shixiong Zhu 2015-12-17 23:43 GMT-08:00 Pankaj Narang <pankajnaran...@gmail.com>: > I am encountering below error. Can somebody guide ? > > Something similar is one this link > https://

Re: Question about Spark Streaming checkpoint interval

2015-12-18 Thread Shixiong Zhu
You are right. "checkpointInterval" is only for data checkpointing. "metadata checkpoint" is done for each batch. Feel free to send a PR to add the missing doc. Best Regards, Shixiong Zhu 2015-12-18 8:26 GMT-08:00 Lan Jiang <ljia...@gmail.com>: > Need some clarific

Re: pyspark + kafka + streaming = NoSuchMethodError

2015-12-17 Thread Shixiong Zhu
What's the Scala version of your Spark? Is it 2.10? Best Regards, Shixiong Zhu 2015-12-17 10:10 GMT-08:00 Christos Mantas <cman...@cslab.ece.ntua.gr>: > Hello, > > I am trying to set up a simple example with Spark Streaming (Python) and > Kafka on a single machine deployment.

Re: Help with Couchbase connector error

2015-11-26 Thread Shixiong Zhu
Het Eyal, I just checked the couchbase spark connector jar. The target version of some of classes are Java 8 (52.0). You can create a ticket in https://issues.couchbase.com/projects/SPARKC Best Regards, Shixiong Zhu 2015-11-26 9:03 GMT-08:00 Ted Yu <yuzhih...@gmail.com>: > StoreMod

<    1   2