This is fixed in 1.2.1, could you upgrade to 1.2.1?
On Thu, Feb 12, 2015 at 4:55 AM, Rok Roskar wrote:
> Hi again,
>
> I narrowed down the issue a bit more -- it seems to have to do with the Kryo
> serializer. When I use it, then this results in a Null Pointer:
>
> rdd = sc.parallelize(range(10)
Hi again,
I narrowed down the issue a bit more -- it seems to have to do with the
Kryo serializer. When I use it, then this results in a Null Pointer:
rdd = sc.parallelize(range(10))
d = {}
from random import random
for i in range(10) :
d[i] = random()
rdd.map(lambda x: d[x]).collect()
I think the problem was related to the broadcasts being too large -- I've
now split it up into many smaller operations but it's still not quite there
-- see
http://apache-spark-user-list.1001560.n3.nabble.com/iteratively-modifying-an-RDD-td21606.html
Thanks,
Rok
On Wed, Feb 11, 2015, 19:59 Davie
Could you share a short script to reproduce this problem?
On Tue, Feb 10, 2015 at 8:55 PM, Rok Roskar wrote:
> I didn't notice other errors -- I also thought such a large broadcast is a
> bad idea but I tried something similar with a much smaller dictionary and
> encountered the same problem. I'm
I didn't notice other errors -- I also thought such a large broadcast is a
bad idea but I tried something similar with a much smaller dictionary and
encountered the same problem. I'm not familiar enough with spark internals
to know whether the trace indicates an issue with the broadcast variables
o
It's brave to broadcast 8G pickled data, it will take more than 15G in
memory for each Python worker,
how much memory do you have in executor and driver?
Do you see any other exceptions in driver and executors? Something
related to serialization in JVM.
On Tue, Feb 10, 2015 at 2:16 PM, Rok Roskar
I get this in the driver log:
java.lang.NullPointerException
at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:590)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(PythonRDD.scala:233)
at
org.apache.spark
Could you paste the NPE stack trace here? It will better to create a
JIRA for it, thanks!
On Tue, Feb 10, 2015 at 10:42 AM, rok wrote:
> I'm trying to use a broadcasted dictionary inside a map function and am
> consistently getting Java null pointer exceptions. This is inside an IPython
> session