I unfortunately haven't seen this directly. But some typical things I try when debugging are as follows.
Do you see a corresponding error on the other side of that connection (alpinenode7.alpinenow.local)? Or is that the same machine? Also, do the driver logs show any longer stack trace and have you enabled the history server, so you can see some more details about execution? That helps me tremendously. -Suren On Wed, Jun 25, 2014 at 11:08 PM, Sung Hwan Chung <coded...@cs.stanford.edu> wrote: > I'm seeing the following message in the log of an executor. Anyone seen this > error? After this, the executor seems to lose the cache, and but besides that > the whole thing slows down drastically - I.e. it gets stuck in a reduce phase > for 40+ minutes, whereas before it was finishing reduces in 2~3 seconds. > > > > 14/06/25 19:22:31 WARN SendingConnection: Error writing in connection to > ConnectionManagerId(alpinenode7.alpinenow.local,46251) > java.lang.NullPointerException > at > org.apache.spark.network.MessageChunkHeader.buffer$lzycompute(MessageChunkHeader.scala:35) > at > org.apache.spark.network.MessageChunkHeader.buffer(MessageChunkHeader.scala:32) > at > org.apache.spark.network.MessageChunk.buffers$lzycompute(MessageChunk.scala:31) > at org.apache.spark.network.MessageChunk.buffers(MessageChunk.scala:29) > at > org.apache.spark.network.SendingConnection.write(Connection.scala:349) > at > org.apache.spark.network.ConnectionManager$$anon$5.run(ConnectionManager.scala:142) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > > -- SUREN HIRAMAN, VP TECHNOLOGY Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR NEW YORK, NY 10001 O: (917) 525-2466 ext. 105 F: 646.349.4063 E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io W: www.velos.io