Re: NoSuchMethodError in KafkaReciever

2014-07-08 Thread Michael Chang
To be honest I'm a scala newbie too. I just copied it from createStream. I assume it's the canonical way to convert a java map (JMap) to a scala map (Map) On Mon, Jul 7, 2014 at 1:40 PM, mcampbell wrote: > xtrahotsauce wrote > > I had this same problem as well. I ended up just adding the nec

Re: How to achieve reasonable performance on Spark Streaming?

2014-06-13 Thread Michael Chang
I'm interested in this issue as well. I have spark streaming jobs that seems to run well for a while, but slowly degrade and don't recover. On Wed, Jun 11, 2014 at 11:08 PM, Boduo Li wrote: > It seems that the slow "reduce" tasks are caused by slow shuffling. Here is > the logs regarding one s

Re: Spilled shuffle files not being cleared

2014-06-13 Thread Michael Chang
will clean old un-used shuffle data when it is timeout. > > > > For Spark 1.0 another way is to clean shuffle data using weak reference > (reference tracking based, configuration is > spark.cleaner.referenceTracking), and it is enabled by default. > > > > Than

Re: Spilled shuffle files not being cleared

2014-06-12 Thread Michael Chang
Bump On Mon, Jun 9, 2014 at 3:22 PM, Michael Chang wrote: > Hi all, > > I'm seeing exceptions that look like the below in Spark 0.9.1. It looks > like I'm running out of inodes on my machines (I have around 300k each in a > 12 machine cluster). I took a quick look an

Re: NoSuchMethodError in KafkaReciever

2014-06-10 Thread Michael Chang
I had this same problem as well. I ended up just adding the necessary code in KafkaUtil and compiling my own spark jar. Something like this for the "raw" stream: def createRawStream( jssc: JavaStreamingContext, kafkaParams: JMap[String, String], topics: JMap[String, JInt]

Spilled shuffle files not being cleared

2014-06-09 Thread Michael Chang
Hi all, I'm seeing exceptions that look like the below in Spark 0.9.1. It looks like I'm running out of inodes on my machines (I have around 300k each in a 12 machine cluster). I took a quick look and I'm seeing some shuffle spill files that are around even around 12 minutes after they are creat

Re: Failed to remove RDD error

2014-06-09 Thread Michael Chang
blems. > We need to investigate the cause of this. Can you give us the logs showing > this error so that we can analyze this. > > TD > > > On Tue, Jun 3, 2014 at 10:08 AM, Michael Chang wrote: > >> Thanks Tathagata, >> >> Thanks for all your hard work! I

Using log4j.xml

2014-06-04 Thread Michael Chang
Has anyone tried to use a log4j.xml instead of a log4j.properties with spark 0.9.1? I'm trying to run spark streaming on yarn and i've set the environment variable SPARK_LOG4J_CONF to a log4j.xml file instead of a log4j.properties file, but spark seems to be using the default log4j.properties SLF

Re: NoSuchElementException: key not found

2014-06-03 Thread Michael Chang
RA for this. The fix is not trivial though. > > https://issues.apache.org/jira/browse/SPARK-2002 > > A "not-so-good" workaround for now would be not use coalesced RDD, which > is avoids the race condition. > > TD > > > On Tue, Jun 3, 2014 at 10:09 AM, Michael C

Re: NoSuchElementException: key not found

2014-06-03 Thread Michael Chang
he value > "32855" to find any references to it? Also what version of the Spark are > you using (so that I can match the stack trace, does not seem to match with > Spark 1.0)? > > TD > > > On Mon, Jun 2, 2014 at 3:27 PM, Michael Chang wrote: > >> Hi all, >>

Re: Failed to remove RDD error

2014-06-03 Thread Michael Chang
> > > On Mon, Jun 2, 2014 at 9:42 AM, Michael Chang wrote: > >> Hey Mayur, >> >> Thanks for the suggestion, I didn't realize that was configurable. I >> don't think I'm running out of memory, though it does seem like these >> errors go away

NoSuchElementException: key not found

2014-06-02 Thread Michael Chang
Hi all, Seeing a random exception kill my spark streaming job. Here's a stack trace: java.util.NoSuchElementException: key not found: 32855 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collectio

Re: Failed to remove RDD error

2014-06-02 Thread Michael Chang
> http://www.sigmoidanalytics.com > @mayur_rustagi <https://twitter.com/mayur_rustagi> > > > > On Sat, May 31, 2014 at 6:52 AM, Michael Chang wrote: > >> I'm running a some kafka streaming spark contexts (on 0.9.1), and they >> seem to be dying after 10 or so minutes wit

Failed to remove RDD error

2014-05-30 Thread Michael Chang
I'm running a some kafka streaming spark contexts (on 0.9.1), and they seem to be dying after 10 or so minutes with a lot of these errors. I can't really tell what's going on here, except that maybe the driver is unresponsive somehow? Has anyone seen this before? 14/05/31 01:13:30 ERROR BlockMan