Indeed, serialization is always tricky when you want to work on objects
that are more sophisticated than simple POJOs.
And you can have sometimes unexpected behaviour when using the deserialized
objects. In my case I had troubles when serializaing/deser Avro specific
records with lists. The impleme
Ok thanks. However it turns out that there's a problem with that and it's
not so safe to use kryo serialization with Spark:
Exception in thread "Executor task launch worker-0"
java.lang.NullPointerException
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1$$anonfun$6.apply(Executor.
Because it happens to reference something outside the closures scope that
will reference some other objects (that you don't need) and so one,
resulting in serializing with your task a lot of things that you don't
want. But sure it is discutable and it's more my personal opinion.
2014-04-17 23:28
Thanks again Eugen! I don't get the point..why you prefer to avoid kyro ser
for closures?is there any problem with that?
On Apr 17, 2014 11:10 PM, "Eugen Cepoi" wrote:
> You have two kind of ser : data and closures. They both use java ser. This
> means that in your function you reference an objec
You have two kind of ser : data and closures. They both use java ser. This
means that in your function you reference an object outside of it and it is
getting ser with your task. To enable kryo ser for closures set
spark.closure.serializer property. But usualy I dont as it allows me to
detect such
Now I have another problem..I have to pass one o this non serializable
object to a PairFunction and I received another non serializable
exception..it seems that Kyro doesn't work within Functions. Am I wrong or
this is a limit of Spark?
On Apr 15, 2014 1:36 PM, "Flavio Pompermaier" wrote:
> Ok th
Ok thanks for the help!
Best,
Flavio
On Tue, Apr 15, 2014 at 12:43 AM, Eugen Cepoi wrote:
> Nope, those operations are lazy, meaning it will create the RDDs but won't
> trigger any "action". The computation is launched by operations such as
> collect, count, save to HDFS etc. And even if they
Nope, those operations are lazy, meaning it will create the RDDs but won't
trigger any "action". The computation is launched by operations such as
collect, count, save to HDFS etc. And even if they were not lazy, no
serialization would happen. Serialization occurs only when data will be
transfered
Ok, that's fair enough. But why things work up to the collect?during map
and filter objects are not serialized?
On Apr 15, 2014 12:31 AM, "Eugen Cepoi" wrote:
> Sure. As you have pointed, those classes don't implement Serializable and
> Spark uses by default java serialization (when you do collec
Sure. As you have pointed, those classes don't implement Serializable and
Spark uses by default java serialization (when you do collect the data from
the workers will be serialized, "collected" by the driver and then
deserialized on the driver side). Kryo (as most other decent serialization
libs) d
Thanks Eugen for tgee reply. Could you explain me why I have the
problem?Why my serialization doesn't work?
On Apr 14, 2014 6:40 PM, "Eugen Cepoi" wrote:
> Hi,
>
> as a easy workaround you can enable Kryo serialization
> http://spark.apache.org/docs/latest/configuration.html
>
> Eugen
>
>
> 2014-
Hi,
as a easy workaround you can enable Kryo serialization
http://spark.apache.org/docs/latest/configuration.html
Eugen
2014-04-14 18:21 GMT+02:00 Flavio Pompermaier :
> Hi to all,
>
> in my application I read objects that are not serializable because I
> cannot modify the sources.
> So I trie
Hi to all,
in my application I read objects that are not serializable because I cannot
modify the sources.
So I tried to do a workaround creating a dummy class that extends the
unmodifiable one but implements serializable.
All attributes of the parent class are Lists of objects (some of them are
s
13 matches
Mail list logo