Re: executor failures w/ scala 2.10

2013-11-13 Thread Prashant Sharma
We may no longer need to track disassociation and IMHO use the *improved* feature in akka 2.2.x called remote death watch. Which lets us acknowledge a remote death both in case of a natural demise and accidental deaths. This was not the case with remote death watch in previous akka releases.

Re: write data into HBase via spark

2013-11-13 Thread Hao REN
Ok, I worked it out. The following thread helps a lot. http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C7B4868A9-B83E-4507-BB2A-2721FCE8E738%40gmail.com%3E Hao 2013/11/12 Hao REN julien19890...@gmail.com Could someone show me a simple example about how to write

Re: write data into HBase via spark

2013-11-13 Thread Philip Ogren
Hao, If you have worked out the code and turn it into an example that you can share, then please do! This task is in my queue of things to do so any helpful details that you uncovered would be most appreciated. Thanks, Philip On 11/13/2013 5:30 AM, Hao REN wrote: Ok, I worked it out.

Re: executor failures w/ scala 2.10

2013-11-13 Thread Matei Zaharia
Hey Prashant, do messages still get lost while we’re dissociated? Or can you set the timeouts high enough to proven that? Matei On Nov 13, 2013, at 12:39 AM, Prashant Sharma scrapco...@gmail.com wrote: We may no longer need to track disassociation and IMHO use the *improved* feature in akka

Spark Streaming Job Fails to Run Under Mesos-0.14

2013-11-13 Thread Craig Vanderborgh
Hi all, We have a Spark Streaming job that's been working great under Mesos for many months with local as the master. We just tried to run it on our new Mesos cluster. This cluster has been set up properly, and the Spark examples (e.g. SparkPi) run distributed, correctly, under Mesos. But our

code review - splitting columns

2013-11-13 Thread Philip Ogren
Hi Spark community, I learned a lot the last time I posted some elementary Spark code here. So, I thought I would do it again. Someone politely tell me offline if this is noise or unfair use of the list! I acknowledge that this borders on asking Scala 101 questions I have an

RE: Spark Streaming Job Fails to Run Under Mesos-0.14

2013-11-13 Thread Hussam_Jarada
Dell - Internal Use - Confidential Not sure if this would help but I was running into similar failures (my main java driver keep firing jobs which they fails with errorcode 1 but no exception trace till the main driver give up and cause same exception in spark.scheduler.DAGScheduler) when using

[ANNOUNCE] Welcoming two new Spark committers: Tom Graves and Prashant Sharma

2013-11-13 Thread Matei Zaharia
Hi folks, The Apache Spark PPMC is happy to welcome two new PPMC members and committers: Tom Graves and Prashant Sharma. Tom has been maintaining and expanding the YARN support in Spark over the past few months, including adding big features such as support for YARN security, and recently

Re: executor failures w/ scala 2.10

2013-11-13 Thread Prashant Sharma
We can set timeouts high enough ! same as connection timeout that we already set. On Wed, Nov 13, 2013 at 11:37 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Hey Prashant, do messages still get lost while we’re dissociated? Or can you set the timeouts high enough to proven that? Matei

Removing RDDs' data from BlockManager

2013-11-13 Thread Meisam Fathi
Hi Community, When an RDD in the application becomes unreachable and gets garbage collected, how does Spark remove RDD's data from BlockManagers on the worker nodes? Thanks, Meisam

Re: Removing RDDs' data from BlockManager

2013-11-13 Thread Matei Zaharia
Hi Meisam, Each block manager removes data from the cache in a least-recently-used fashion as space fills up. If you’d like to remove an RDD manually before that, you can call rdd.unpersist(). Matei On Nov 13, 2013, at 8:15 PM, Meisam Fathi meisam.fa...@gmail.com wrote: Hi Community,

Re: Removing RDDs' data from BlockManager

2013-11-13 Thread Meisam Fathi
Hi Matei, Thank you for clarification. I agree that users can always call rdd.unpersist() to evict RDD's data from BlockManager. But if the rdd object becomes unreachable, then its data in BlockManager is unusable. Ideally, BlockManager should evict unusable data before any other data. It is like

Re: interesting finding per using union

2013-11-13 Thread Matei Zaharia
Union just puts the data in two RDDs together, so you get an RDD containing the elements of both, and with the partitions that would’ve been in both. It’s not a unique set union (that would be union() then distinct()). Here you’ve unioned four RDDs of 32 partitions each to get 128. If you want

RE: interesting finding per using union

2013-11-13 Thread Hussam_Jarada
Dell - Internal Use - Confidential Yes I unioned four RDDs of 32 partitions each. Thank you, Hussam From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: Wednesday, November 13, 2013 10:37 PM To: user@spark.incubator.apache.org Subject: Re: interesting finding per using union Union just