[ 
https://issues.apache.org/jira/browse/SPARK-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12750.
-------------------------------
    Resolution: Not A Problem

The problem is as it says. You've written Java code such that your Function 
retains a reference to a non-serializable object.

> Java class method don't work properly
> -------------------------------------
>
>                 Key: SPARK-12750
>                 URL: https://issues.apache.org/jira/browse/SPARK-12750
>             Project: Spark
>          Issue Type: Question
>            Reporter: Gramce
>
> I use java spark to tansform the labeledpoint.
> I want to select several columns from the JavaRdd<labeledPoint>. For example 
> the first three colunmns.
> So I wrote like this:
> int[] ad={1,2,3};
> int b=ad.length;  
> JavaRDD<LabeledPoint> ggd=parsedData.map(
>                       new Function<LabeledPoint, LabeledPoint>(){
>                               public LabeledPoint call(LabeledPoint a){
>                                       double[] v =new double[b];
>                                       for(int i=0;i<b;i++){
>                                               
> v[i]=a.features().toArray()[ad[i]];
>                                                       }
>                                       return new 
> LabeledPoint(a.label(),Vectors.dense(v));
>                                       }       
>                                       });
> where parsedData is a LabeledPoint data.
> Now I want to converse this to a method. So the code is like this:
> class myrddd{
>       public JavaRDD<LabeledPoint> abcd;
>       public myrddd(JavaRDD<LabeledPoint> deff ){
>               abcd=deff;
>               }
>       public JavaRDD<LabeledPoint> abcdf(int[]asdf,int b){
>               JavaRDD<LabeledPoint> bcd=abcd;
>               JavaRDD<LabeledPoint> mms=bcd.map(
>                       new Function<LabeledPoint, LabeledPoint>(){
>                               public LabeledPoint call(LabeledPoint a){
>                                       double[] v =new double[b];
>                                       for(int i=0;i<b;i++){
>                                               
> v[i]=a.features().toArray()[asdf[i]];
>                                                       }
>                                       return new 
> LabeledPoint(a.label(),Vectors.dense(v));
>                                       }       
>                                       });
>               return(mms);}
> }
> And
> myrddd ndfs=new myrddd(parsedData);
> JavaRDD<LabeledPoint> ggdf=ndfs.abcdf(ad, b);
> But this doesn't work.Following is the error:
> Exception in thread "main" org.apache.spark.SparkException: Task not 
> serializable
>       at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
>       at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
>       at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>       at org.apache.spark.SparkContext.clean(SparkContext.scala:2032)
>       at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:318)
>       at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:317)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>       at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
>       at org.apache.spark.rdd.RDD.map(RDD.scala:317)
>       at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:93)
>       at 
> org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:47)
>       at anbv.qwe.myrddd.abcdf(dfa.java:53)
>       at anbv.qwe.dfa.main(dfa.java:42)
> Caused by: java.io.NotSerializableException: anbv.qwe.myrddd
> Serialization stack:
>       - object not serializable (class: anbv.qwe.myrddd, value: 
> anbv.qwe.myrddd@310aee0b)
>       - field (class: anbv.qwe.myrddd$1, name: this$0, type: class 
> anbv.qwe.myrddd)
>       - object (class anbv.qwe.myrddd$1, anbv.qwe.myrddd$1@4b76aa5a)
>       - field (class: 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: 
> fun$1, type: interface org.apache.spark.api.java.function.Function)
>       - object (class 
> org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
>       at 
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
>       at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
>       at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
>       at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
>       ... 13 more
> but this



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to