[ 
https://issues.apache.org/jira/browse/SPARK-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025363#comment-14025363
 ] 

Kan Zhang commented on SPARK-2079:
----------------------------------

PR: https://github.com/apache/spark/pull/1023

> Skip unnecessary wrapping in List when serializing SchemaRDD to Python
> ----------------------------------------------------------------------
>
>                 Key: SPARK-2079
>                 URL: https://issues.apache.org/jira/browse/SPARK-2079
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 1.0.0
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>
> Finishing the TODO:
> {code}
>   private[sql] def javaToPython: JavaRDD[Array[Byte]] = {
>     val fieldNames: Seq[String] = 
> this.queryExecution.analyzed.output.map(_.name)
>     this.mapPartitions { iter =>
>       val pickle = new Pickler
>       iter.map { row =>
>         val map: JMap[String, Any] = new java.util.HashMap
>         // TODO: We place the map in an ArrayList so that the object is 
> pickled to a List[Dict].
>         // Ideally we should be able to pickle an object directly into a 
> Python collection so we
>         // don't have to create an ArrayList every time.
>         val arr: java.util.ArrayList[Any] = new java.util.ArrayList
>         row.zip(fieldNames).foreach { case (obj, name) =>
>           map.put(name, obj)
>         }
>         arr.add(map)
>         pickle.dumps(arr)
>       }
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to