I am very curious though. Can you post a concise code example which we can
run to reproduce this problem?

TD


On Tue, Jul 15, 2014 at 2:54 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> I am not entire sure off the top of my head. But a possible (usually
> works) workaround is to define the function as a val instead of a def. For
> example
>
> def func(i: Int): Boolean = { true }
>
> can be written as
>
> val func = (i: Int) => { true }
>
> Hope this helps for now.
>
> TD
>
>
> On Tue, Jul 15, 2014 at 9:21 AM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Hey Diana,
>>
>> Did you ever figure this out?
>>
>> I’m running into the same exception, except in my case the function I’m
>> calling is a KMeans model.predict().
>>
>> In regular Spark it works, and Spark Streaming without the call to
>> model.predict() also works, but when put together I get this
>> serialization exception. I’m on 1.0.0.
>>
>> Nick
>> ​
>>
>>
>> On Thu, May 8, 2014 at 6:37 AM, Diana Carroll <dcarr...@cloudera.com>
>> wrote:
>>
>>> Hey all, trying to set up a pretty simple streaming app and getting some
>>> weird behavior.
>>>
>>>  First, a non-streaming job that works fine:  I'm trying to pull out
>>> lines of a log file that match a regex, for which I've set up a function:
>>>
>>> def getRequestDoc(s: String):
>>>     String = { "KBDOC-[0-9]*".r.findFirstIn(s).orNull }
>>> logs=sc.textFile(logfiles)
>>> logs.map(getRequestDoc).take(10)
>>>
>>> That works, but I want to run that on the same data, but streaming, so I
>>> tried this:
>>>
>>> val logs = ssc.socketTextStream("localhost",4444)
>>> logs.map(getRequestDoc).print()
>>> ssc.start()
>>>
>>> From this code, I get:
>>> 14/05/08 03:32:08 ERROR JobScheduler: Error running job streaming job
>>> 1399545128000 ms.0
>>> org.apache.spark.SparkException: Job aborted: Task not serializable:
>>> java.io.NotSerializableException:
>>> org.apache.spark.streaming.StreamingContext
>>>
>>>
>>> But if I do the map function inline instead of calling a separate
>>> function, it works:
>>>
>>> logs.map("KBDOC-[0-9]*".r.findFirstIn(_).orNull).print()
>>>
>>> So why is it able to serialize my little function in regular spark, but
>>> not in streaming?
>>>
>>> Thanks,
>>> Diana
>>>
>>>
>>>
>>
>

Reply via email to