Re: org.apache.spark.SparkException: Task not serializable

2017-03-13 Thread Yong Zhang
In fact, I will suggest different way to handle the originally problem.


The example listed originally comes with a Java Function doesn't use any 
instance fields/methods, so serializing the whole class is a overkill solution.


Instead, you can/should make the Function static, which will work in the logic 
of that function tries to do, and it is a better solution than marking the 
whole class serializable.


The whole issue is that the function is not static, but doesn't use any 
instance fields or other methods. But Spark sends the non-static function call, 
it has to wrapper the whole class which contains the function as a whole 
closure through network, and in this case, it requires the whole class to be 
serializable.


Yong



From: 颜发才(Yan Facai) 
Sent: Saturday, March 11, 2017 6:48 AM
To: 萝卜丝炒饭
Cc: Mina Aslani; Ankur Srivastava; user@spark.apache.org
Subject: Re: org.apache.spark.SparkException: Task not serializable

For scala,
make your class Serializable, like this
```
class YourClass extends Serializable {
}
```



On Sat, Mar 11, 2017 at 3:51 PM, 萝卜丝炒饭 
<1427357...@qq.com<mailto:1427357...@qq.com>> wrote:
hi mina,

can you paste your new code here pleasel
i meet this issue too but do not get Ankur's idea.

thanks
Robin

---Original---
From: "Mina Aslani"mailto:aslanim...@gmail.com>>
Date: 2017/3/7 05:32:10
To: "Ankur 
Srivastava"mailto:ankur.srivast...@gmail.com>>;
Cc: 
"user@spark.apache.org<mailto:user@spark.apache.org>"mailto:user@spark.apache.org>>;
Subject: Re: org.apache.spark.SparkException: Task not serializable

Thank you Ankur for the quick response, really appreciate it! Making the class 
serializable resolved the exception!

Best regards,
Mina

On Mon, Mar 6, 2017 at 4:20 PM, Ankur Srivastava 
mailto:ankur.srivast...@gmail.com>> wrote:
The fix for this make your class Serializable. The reason being the closures 
you have defined in the class need to be serialized and copied over to all 
executor nodes.

Hope this helps.

Thanks
Ankur

On Mon, Mar 6, 2017 at 1:06 PM, Mina Aslani 
mailto:aslanim...@gmail.com>> wrote:

Hi,

I am trying to start with spark and get number of lines of a text file in my 
mac, however I get

org.apache.spark.SparkException: Task not serializable error on

JavaRDD logData = javaCtx.textFile(file);

Please see below for the sample of code and the stackTrace.

Any idea why this error is thrown?

Best regards,

Mina

System.out.println("Creating Spark Configuration");
SparkConf javaConf = new SparkConf();
javaConf.setAppName("My First Spark Java Application");
javaConf.setMaster("PATH to my spark");
System.out.println("Creating Spark Context");
JavaSparkContext javaCtx = new JavaSparkContext(javaConf);
System.out.println("Loading the Dataset and will further process it");
String file = "file:///file.txt";
JavaRDD logData = javaCtx.textFile(file);

long numLines = logData.filter(new Function() {
   public Boolean call(String s) {
  return true;
   }
}).count();

System.out.println("Number of Lines in the Dataset "+numLines);

javaCtx.close();


Exception in thread "main" org.apache.spark.SparkException: Task not 
serializable
at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:386)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.filter(RDD.scala:386)
at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)





Re: org.apache.spark.SparkException: Task not serializable

2017-03-11 Thread Yan Facai
For scala,
make your class Serializable, like this
```
class YourClass




*extends Serializable {}```*

On Sat, Mar 11, 2017 at 3:51 PM, 萝卜丝炒饭 <1427357...@qq.com> wrote:

> hi mina,
>
> can you paste your new code here pleasel
> i meet this issue too but do not get Ankur's idea.
>
> thanks
> Robin
>
> ---Original---
> *From:* "Mina Aslani"
> *Date:* 2017/3/7 05:32:10
> *To:* "Ankur Srivastava";
> *Cc:* "user@spark.apache.org";
> *Subject:* Re: org.apache.spark.SparkException: Task not serializable
>
> Thank you Ankur for the quick response, really appreciate it! Making the
> class serializable resolved the exception!
>
> Best regards,
> Mina
>
> On Mon, Mar 6, 2017 at 4:20 PM, Ankur Srivastava <
> ankur.srivast...@gmail.com> wrote:
>
>> The fix for this make your class Serializable. The reason being the
>> closures you have defined in the class need to be serialized and copied
>> over to all executor nodes.
>>
>> Hope this helps.
>>
>> Thanks
>> Ankur
>>
>> On Mon, Mar 6, 2017 at 1:06 PM, Mina Aslani  wrote:
>>
>>> Hi,
>>>
>>> I am trying to start with spark and get number of lines of a text file in 
>>> my mac, however I get
>>>
>>> org.apache.spark.SparkException: Task not serializable error on
>>>
>>> JavaRDD logData = javaCtx.textFile(file);
>>>
>>> Please see below for the sample of code and the stackTrace.
>>>
>>> Any idea why this error is thrown?
>>>
>>> Best regards,
>>>
>>> Mina
>>>
>>> System.out.println("Creating Spark Configuration");
>>> SparkConf javaConf = new SparkConf();
>>> javaConf.setAppName("My First Spark Java Application");
>>> javaConf.setMaster("PATH to my spark");
>>> System.out.println("Creating Spark Context");
>>> JavaSparkContext javaCtx = new JavaSparkContext(javaConf);
>>> System.out.println("Loading the Dataset and will further process it");
>>> String file = "file:///file.txt";
>>> JavaRDD logData = javaCtx.textFile(file);
>>>
>>> long numLines = logData.filter(new Function() {
>>>public Boolean call(String s) {
>>>   return true;
>>>}
>>> }).count();
>>>
>>> System.out.println("Number of Lines in the Dataset "+numLines);
>>>
>>> javaCtx.close();
>>>
>>> Exception in thread "main" org.apache.spark.SparkException: Task not 
>>> serializable
>>> at 
>>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
>>> at 
>>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
>>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
>>> at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387)
>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:386)
>>> at 
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>>> at 
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>>> at org.apache.spark.rdd.RDD.filter(RDD.scala:386)
>>> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
>>>
>>>
>>
>


Re: org.apache.spark.SparkException: Task not serializable

2017-03-10 Thread ??????????
hi mina,

can you paste your new code here pleasel
i meet this issue too but do not get Ankur's idea.

thanks 
Robin

---Original---
From: "Mina Aslani"
Date: 2017/3/7 05:32:10
To: "Ankur Srivastava";
Cc: "user@spark.apache.org";
Subject: Re: org.apache.spark.SparkException: Task not serializable


Thank you Ankur for the quick response, really appreciate it! Making the class 
serializable resolved the exception!

Best regards,Mina


On Mon, Mar 6, 2017 at 4:20 PM, Ankur Srivastava  
wrote:
The fix for this make your class Serializable. The reason being the closures 
you have defined in the class need to be serialized and copied over to all 
executor nodes.

Hope this helps.


Thanks
Ankur


On Mon, Mar 6, 2017 at 1:06 PM, Mina Aslani  wrote:
Hi,I am trying to start with spark and get number of lines of a text file in my 
mac, however I get  org.apache.spark.SparkException: Task not serializable 
error on JavaRDD logData = javaCtx.textFile(file);Please see below for 
the sample of code and the stackTrace.Any idea why this error is thrown?Best 
regards,MinaSystem.out.println("Creating Spark Configuration");
SparkConf javaConf = new SparkConf();
javaConf.setAppName("My First Spark Java Application");
javaConf.setMaster("PATH to my spark");
System.out.println("Creating Spark Context");
JavaSparkContext javaCtx = new JavaSparkContext(javaConf);
System.out.println("Loading the Dataset and will further process it");
String file = "file:///file.txt";
JavaRDD logData = javaCtx.textFile(file);
long numLines = logData.filter(new Function() {
   public Boolean call(String s) {
  return true;
   }
}).count();

System.out.println("Number of Lines in the Dataset "+numLines);

javaCtx.close(); Exception in thread "main" org.apache.spark.SparkException: 
Task not serializable at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
 at 
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
 at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) at 
org.apache.spark.SparkContext.clean(SparkContext.scala:2094) at 
org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387) at 
org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:386) at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) 
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at 
org.apache.spark.rdd.RDD.filter(RDD.scala:386) at 
org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)

Re: org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Mina Aslani
Thank you Ankur for the quick response, really appreciate it! Making the
class serializable resolved the exception!

Best regards,
Mina

On Mon, Mar 6, 2017 at 4:20 PM, Ankur Srivastava  wrote:

> The fix for this make your class Serializable. The reason being the
> closures you have defined in the class need to be serialized and copied
> over to all executor nodes.
>
> Hope this helps.
>
> Thanks
> Ankur
>
> On Mon, Mar 6, 2017 at 1:06 PM, Mina Aslani  wrote:
>
>> Hi,
>>
>> I am trying to start with spark and get number of lines of a text file in my 
>> mac, however I get
>>
>> org.apache.spark.SparkException: Task not serializable error on
>>
>> JavaRDD logData = javaCtx.textFile(file);
>>
>> Please see below for the sample of code and the stackTrace.
>>
>> Any idea why this error is thrown?
>>
>> Best regards,
>>
>> Mina
>>
>> System.out.println("Creating Spark Configuration");
>> SparkConf javaConf = new SparkConf();
>> javaConf.setAppName("My First Spark Java Application");
>> javaConf.setMaster("PATH to my spark");
>> System.out.println("Creating Spark Context");
>> JavaSparkContext javaCtx = new JavaSparkContext(javaConf);
>> System.out.println("Loading the Dataset and will further process it");
>> String file = "file:///file.txt";
>> JavaRDD logData = javaCtx.textFile(file);
>>
>> long numLines = logData.filter(new Function() {
>>public Boolean call(String s) {
>>   return true;
>>}
>> }).count();
>>
>> System.out.println("Number of Lines in the Dataset "+numLines);
>>
>> javaCtx.close();
>>
>> Exception in thread "main" org.apache.spark.SparkException: Task not 
>> serializable
>>  at 
>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
>>  at 
>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
>>  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
>>  at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
>>  at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387)
>>  at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:386)
>>  at 
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>>  at 
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>>  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>>  at org.apache.spark.rdd.RDD.filter(RDD.scala:386)
>>  at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
>>
>>
>


Re: org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Ankur Srivastava
The fix for this make your class Serializable. The reason being the
closures you have defined in the class need to be serialized and copied
over to all executor nodes.

Hope this helps.

Thanks
Ankur

On Mon, Mar 6, 2017 at 1:06 PM, Mina Aslani  wrote:

> Hi,
>
> I am trying to start with spark and get number of lines of a text file in my 
> mac, however I get
>
> org.apache.spark.SparkException: Task not serializable error on
>
> JavaRDD logData = javaCtx.textFile(file);
>
> Please see below for the sample of code and the stackTrace.
>
> Any idea why this error is thrown?
>
> Best regards,
>
> Mina
>
> System.out.println("Creating Spark Configuration");
> SparkConf javaConf = new SparkConf();
> javaConf.setAppName("My First Spark Java Application");
> javaConf.setMaster("PATH to my spark");
> System.out.println("Creating Spark Context");
> JavaSparkContext javaCtx = new JavaSparkContext(javaConf);
> System.out.println("Loading the Dataset and will further process it");
> String file = "file:///file.txt";
> JavaRDD logData = javaCtx.textFile(file);
>
> long numLines = logData.filter(new Function() {
>public Boolean call(String s) {
>   return true;
>}
> }).count();
>
> System.out.println("Number of Lines in the Dataset "+numLines);
>
> javaCtx.close();
>
> Exception in thread "main" org.apache.spark.SparkException: Task not 
> serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
>   at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
>   at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:387)
>   at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:386)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>   at org.apache.spark.rdd.RDD.filter(RDD.scala:386)
>   at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
>
>