I am very curious though. Can you post a concise code example which we can run to reproduce this problem?
TD On Tue, Jul 15, 2014 at 2:54 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > I am not entire sure off the top of my head. But a possible (usually > works) workaround is to define the function as a val instead of a def. For > example > > def func(i: Int): Boolean = { true } > > can be written as > > val func = (i: Int) => { true } > > Hope this helps for now. > > TD > > > On Tue, Jul 15, 2014 at 9:21 AM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Hey Diana, >> >> Did you ever figure this out? >> >> I’m running into the same exception, except in my case the function I’m >> calling is a KMeans model.predict(). >> >> In regular Spark it works, and Spark Streaming without the call to >> model.predict() also works, but when put together I get this >> serialization exception. I’m on 1.0.0. >> >> Nick >> >> >> >> On Thu, May 8, 2014 at 6:37 AM, Diana Carroll <dcarr...@cloudera.com> >> wrote: >> >>> Hey all, trying to set up a pretty simple streaming app and getting some >>> weird behavior. >>> >>> First, a non-streaming job that works fine: I'm trying to pull out >>> lines of a log file that match a regex, for which I've set up a function: >>> >>> def getRequestDoc(s: String): >>> String = { "KBDOC-[0-9]*".r.findFirstIn(s).orNull } >>> logs=sc.textFile(logfiles) >>> logs.map(getRequestDoc).take(10) >>> >>> That works, but I want to run that on the same data, but streaming, so I >>> tried this: >>> >>> val logs = ssc.socketTextStream("localhost",4444) >>> logs.map(getRequestDoc).print() >>> ssc.start() >>> >>> From this code, I get: >>> 14/05/08 03:32:08 ERROR JobScheduler: Error running job streaming job >>> 1399545128000 ms.0 >>> org.apache.spark.SparkException: Job aborted: Task not serializable: >>> java.io.NotSerializableException: >>> org.apache.spark.streaming.StreamingContext >>> >>> >>> But if I do the map function inline instead of calling a separate >>> function, it works: >>> >>> logs.map("KBDOC-[0-9]*".r.findFirstIn(_).orNull).print() >>> >>> So why is it able to serialize my little function in regular spark, but >>> not in streaming? >>> >>> Thanks, >>> Diana >>> >>> >>> >> >