[GitHub] spark pull request #19792: [SPARK-22566][PYTHON] Better error message for `_...

HyukjinKwon Mon, 04 Dec 2017 16:53:44 -0800

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19792#discussion_r154820118
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -324,11 +324,12 @@ def range(self, start, end=None, step=1, 
numPartitions=None):
     
             return DataFrame(jdf, self._wrapped)
     
    -    def _inferSchemaFromList(self, data):
    +    def _inferSchemaFromList(self, data, names=None):
    --- End diff --
    
    Can we set the `names` for `_createFromRDD` -> `_inferSchema` too?:
    
    ```python
    >>> spark.createDataFrame(spark.sparkContext.parallelize([[None, 1], ["a", 
None], [1, 1]]), schema=["a", "b"], samplingRatio=0.99)
    ```
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/session.py", line 644, in 
createDataFrame
        rdd, schema = self._createFromRDD(data.map(prepare), schema, 
samplingRatio)
      File "/.../spark/python/pyspark/sql/session.py", line 383, in 
_createFromRDD
        struct = self._inferSchema(rdd, samplingRatio)
      File "/.../spark/python/pyspark/sql/session.py", line 375, in _inferSchema
        schema = rdd.map(_infer_schema).reduce(_merge_type)
      File "/.../spark/python/pyspark/rdd.py", line 852, in reduce
        return reduce(f, vals)
      File "/.../spark/python/pyspark/sql/types.py", line 1133, in _merge_type
        for f in a.fields]
      File "/.../spark/python/pyspark/sql/types.py", line 1126, in _merge_type
        raise TypeError(new_msg("Can not merge type %s and %s" % (type(a), 
type(b))))
    TypeError: field _1: Can not merge type <class 
'pyspark.sql.types.StringType'> and <class 'pyspark.sql.types.LongType'>
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19792: [SPARK-22566][PYTHON] Better error message for `_...

Reply via email to