Thanks Alok, Sean. As suggested by Sean, I tried a sample program. I wrote a function in which I made a reference to a class from third party library that is not serialized and passed it to my map function. On executing I got same exception.
Then I modified the program removed function and written it's contents as anonymous function inside map function. This time the execution succeeded. I understood the explanation of Sean. But request for references to a more detailed explanation and examples for writing efficient spark programs avoiding such pitfalls. ~Sarath On 06-Sep-2014 4:32 pm, "Sean Owen" <so...@cloudera.com> wrote: > I disagree that the generally right change is to try to make the > classes serializable. Usually, classes that are not serializable are > not supposed to be serialized. You're using them in a way that's > causing them to be serialized, and that's probably not desired. > > For example, this is wrong: > > val foo: SomeUnserializableManagerClass = ... > rdd.map(d => foo.bar(d)) > > This is right: > > rdd.map { d => > val foo: SomeUnserializableManagerClass = ... > foo.bar(d) > } > > In the first instance, you create the object on the driver and try to > serialize and copy it to workers. In the second, you're creating > SomeUnserializableManagerClass in the function and therefore on the > worker. > > mapPartitions is better if this creation is expensive. > > On Fri, Sep 5, 2014 at 3:06 PM, Sarath Chandra > <sarathchandra.jos...@algofusiontech.com> wrote: > > Hi, > > > > I'm trying to migrate a map-reduce program to work with spark. I migrated > > the program from Java to Scala. The map-reduce program basically loads a > > HDFS file and for each line in the file it applies several transformation > > functions available in various external libraries. > > > > When I execute this over spark, it is throwing me "Task not serializable" > > exceptions for each and every class being used from these from external > > libraries. I included serialization to few classes which are in my scope, > > but there there are several other classes which are out of my scope like > > org.apache.hadoop.io.Text. > > > > How to overcome these exceptions? > > > > ~Sarath. >