Re: RDD collect help

2014-04-18 Thread Eugen Cepoi
Indeed, serialization is always tricky when you want to work on objects that are more sophisticated than simple POJOs. And you can have sometimes unexpected behaviour when using the deserialized objects. In my case I had troubles when serializaing/deser Avro specific records with lists. The impleme

Re: RDD collect help

2014-04-18 Thread Flavio Pompermaier
Ok thanks. However it turns out that there's a problem with that and it's not so safe to use kryo serialization with Spark: Exception in thread "Executor task launch worker-0" java.lang.NullPointerException at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1$$anonfun$6.apply(Executor.

Re: RDD collect help

2014-04-18 Thread Eugen Cepoi
Because it happens to reference something outside the closures scope that will reference some other objects (that you don't need) and so one, resulting in serializing with your task a lot of things that you don't want. But sure it is discutable and it's more my personal opinion. 2014-04-17 23:28

Re: RDD collect help

2014-04-17 Thread Flavio Pompermaier
Thanks again Eugen! I don't get the point..why you prefer to avoid kyro ser for closures?is there any problem with that? On Apr 17, 2014 11:10 PM, "Eugen Cepoi" wrote: > You have two kind of ser : data and closures. They both use java ser. This > means that in your function you reference an objec

Re: RDD collect help

2014-04-17 Thread Eugen Cepoi
You have two kind of ser : data and closures. They both use java ser. This means that in your function you reference an object outside of it and it is getting ser with your task. To enable kryo ser for closures set spark.closure.serializer property. But usualy I dont as it allows me to detect such

Re: RDD collect help

2014-04-17 Thread Flavio Pompermaier
Now I have another problem..I have to pass one o this non serializable object to a PairFunction and I received another non serializable exception..it seems that Kyro doesn't work within Functions. Am I wrong or this is a limit of Spark? On Apr 15, 2014 1:36 PM, "Flavio Pompermaier" wrote: > Ok th

Re: RDD collect help

2014-04-15 Thread Flavio Pompermaier
Ok thanks for the help! Best, Flavio On Tue, Apr 15, 2014 at 12:43 AM, Eugen Cepoi wrote: > Nope, those operations are lazy, meaning it will create the RDDs but won't > trigger any "action". The computation is launched by operations such as > collect, count, save to HDFS etc. And even if they

Re: RDD collect help

2014-04-14 Thread Eugen Cepoi
Nope, those operations are lazy, meaning it will create the RDDs but won't trigger any "action". The computation is launched by operations such as collect, count, save to HDFS etc. And even if they were not lazy, no serialization would happen. Serialization occurs only when data will be transfered

Re: RDD collect help

2014-04-14 Thread Flavio Pompermaier
Ok, that's fair enough. But why things work up to the collect?during map and filter objects are not serialized? On Apr 15, 2014 12:31 AM, "Eugen Cepoi" wrote: > Sure. As you have pointed, those classes don't implement Serializable and > Spark uses by default java serialization (when you do collec

Re: RDD collect help

2014-04-14 Thread Eugen Cepoi
Sure. As you have pointed, those classes don't implement Serializable and Spark uses by default java serialization (when you do collect the data from the workers will be serialized, "collected" by the driver and then deserialized on the driver side). Kryo (as most other decent serialization libs) d

Re: RDD collect help

2014-04-14 Thread Flavio Pompermaier
Thanks Eugen for tgee reply. Could you explain me why I have the problem?Why my serialization doesn't work? On Apr 14, 2014 6:40 PM, "Eugen Cepoi" wrote: > Hi, > > as a easy workaround you can enable Kryo serialization > http://spark.apache.org/docs/latest/configuration.html > > Eugen > > > 2014-

Re: RDD collect help

2014-04-14 Thread Eugen Cepoi
Hi, as a easy workaround you can enable Kryo serialization http://spark.apache.org/docs/latest/configuration.html Eugen 2014-04-14 18:21 GMT+02:00 Flavio Pompermaier : > Hi to all, > > in my application I read objects that are not serializable because I > cannot modify the sources. > So I trie

RDD collect help

2014-04-14 Thread Flavio Pompermaier
Hi to all, in my application I read objects that are not serializable because I cannot modify the sources. So I tried to do a workaround creating a dummy class that extends the unmodifiable one but implements serializable. All attributes of the parent class are Lists of objects (some of them are s