Re: Broadcast Torrent fail - then the job dies

2014-10-08 Thread Steve Lewis
That converts the error to the following




14/10/08 13:27:40 INFO executor.Executor: Running task 3.0 in stage 0.0
(TID 3)
14/10/08 13:27:40 INFO broadcast.HttpBroadcast: Started reading broadcast
variable 0
14/10/08 13:27:40 ERROR executor.Executor: Exception in task 1.0 in stage
0.0 (TID 1)
java.io.FileNotFoundException: http://192.168.1.4:54221/broadcast_0
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1610)
at
org.apache.spark.broadcast.HttpBroadcast$.org$apache$spark$broadcast$HttpBroadcast$$read(HttpBroadcast.scala:197)
at
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:89)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:991)

Curiously the error is very repeatable on a relatively large and complex
program I am running but the same Spark steps work well when the Objects
are Strings and Integers like word count. My objects are complex but
Serialize well and run when I drop a combineByKey step


On Wed, Oct 8, 2014 at 12:00 PM, Liquan Pei  wrote:

> Hi Lewis,
>
> For debugging purpose, can you try using HttpBroadCast to see if the error
> remains? You can enable HttpBroadCast by setting spark.broadcast.factory
> to org.apache.spark.broadcast.HttpBroadcastFactory in spark conf.
>
> Thanks,
> Liquan
>
> On Wed, Oct 8, 2014 at 11:21 AM, Steve Lewis 
> wrote:
>
>> I am running on Windows 8 using Spark 1.1.0 in local mode with Hadoop 2.2
>> - I repeatedly see
>> the following in my logs.
>>
>> I believe this happens in combineByKey
>>
>>
>> 14/10/08 09:36:30 INFO executor.Executor: Running task 3.0 in stage 0.0
>> (TID 3)
>> 14/10/08 09:36:30 INFO broadcast.TorrentBroadcast: Started reading
>> broadcast variable 0
>> 14/10/08 09:36:35 ERROR broadcast.TorrentBroadcast: Reading broadcast
>> variable 0 failed
>> 14/10/08 09:36:35 INFO broadcast.TorrentBroadcast: Reading broadcast
>> variable 0 took 5.006378813 s
>> 14/10/08 09:36:35 INFO broadcast.TorrentBroadcast: Started reading
>> broadcast variable 0
>> 14/10/08 09:36:35 ERROR executor.Executor: Exception in task 0.0 in stage
>> 0.0 (TID 0)
>> java.lang.NullPointerException
>> at java.nio.ByteBuffer.wrap(ByteBuffer.java:392)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
>> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>
>> -
>>
>
>
>
> --
> Liquan Pei
> Department of Physics
> University of Massachusetts Amherst
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com


Re: Broadcast Torrent fail - then the job dies

2014-10-08 Thread Liquan Pei
Hi Lewis,

For debugging purpose, can you try using HttpBroadCast to see if the error
remains? You can enable HttpBroadCast by setting spark.broadcast.factory
to org.apache.spark.broadcast.HttpBroadcastFactory in spark conf.

Thanks,
Liquan

On Wed, Oct 8, 2014 at 11:21 AM, Steve Lewis  wrote:

> I am running on Windows 8 using Spark 1.1.0 in local mode with Hadoop 2.2
> - I repeatedly see
> the following in my logs.
>
> I believe this happens in combineByKey
>
>
> 14/10/08 09:36:30 INFO executor.Executor: Running task 3.0 in stage 0.0
> (TID 3)
> 14/10/08 09:36:30 INFO broadcast.TorrentBroadcast: Started reading
> broadcast variable 0
> 14/10/08 09:36:35 ERROR broadcast.TorrentBroadcast: Reading broadcast
> variable 0 failed
> 14/10/08 09:36:35 INFO broadcast.TorrentBroadcast: Reading broadcast
> variable 0 took 5.006378813 s
> 14/10/08 09:36:35 INFO broadcast.TorrentBroadcast: Started reading
> broadcast variable 0
> 14/10/08 09:36:35 ERROR executor.Executor: Exception in task 0.0 in stage
> 0.0 (TID 0)
> java.lang.NullPointerException
> at java.nio.ByteBuffer.wrap(ByteBuffer.java:392)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
>
> -
>



-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst