Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1338#discussion_r15386068
  
    --- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala ---
    @@ -92,6 +104,46 @@ private[python] class DefaultConverter extends 
Converter[Any, Any] {
       }
     }
     
    +/**
    + * A converter that converts common types to 
[[org.apache.hadoop.io.Writable]]. Note that array
    + * types are not supported since the user needs to subclass 
[[org.apache.hadoop.io.ArrayWritable]]
    + * to set the type properly. See 
[[org.apache.spark.api.python.DoubleArrayWritable]] and
    + * [[org.apache.spark.api.python.DoubleArrayToWritableConverter]] for an 
example. They are used in
    + * PySpark RDD `saveAsNewAPIHadoopFile` doctest.
    + */
    +private[python] class JavaToWritableConverter extends Converter[Any, 
Writable] {
    +
    +  /**
    +   * Converts common data types to [[org.apache.hadoop.io.Writable]]. Note 
that array types are not
    +   * supported out-of-the-box.
    +   */
    +  private def convertToWritable(obj: Any): Writable = {
    +    import collection.JavaConversions._
    +    obj match {
    +      case i: java.lang.Integer => new IntWritable(i)
    +      case d: java.lang.Double => new DoubleWritable(d)
    +      case l: java.lang.Long => new LongWritable(l)
    +      case f: java.lang.Float => new FloatWritable(f)
    +      case s: java.lang.String => new Text(s)
    +      case b: java.lang.Boolean => new BooleanWritable(b)
    +      case aob: Array[Byte] => new BytesWritable(aob)
    +      case null => NullWritable.get()
    +      case map: java.util.Map[_, _] =>
    +        val mapWritable = new MapWritable()
    +        map.foreach { case (k, v) =>
    +          mapWritable.put(convertToWritable(k), convertToWritable(v))
    +        }
    +        mapWritable
    +      case other => throw new SparkException(s"Data of type $other cannot 
be used")
    --- End diff --
    
    This comment also applies to the other unsupported type messages added in 
this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to