[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...

gberger Tue, 05 Dec 2017 06:57:07 -0800

Github user gberger commented on the issue:

    https://github.com/apache/spark/pull/19792
  
    @HyukjinKwon done, with test added. 
    
    ```
    >>> spark.createDataFrame(spark.sparkContext.parallelize([[None, 1], ["a", 
None], [1, 1]]), schema=["a", "b"], samplingRatio=0.99)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line 
644, in createDataFrame
        rdd, schema = self._createFromRDD(data.map(prepare), schema, 
samplingRatio)
      File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line 
383, in _createFromRDD
        struct = self._inferSchema(rdd, samplingRatio, names=schema)
      File "/Users/gberger/Projects/spark/python/pyspark/sql/session.py", line 
375, in _inferSchema
        schema = rdd.map(lambda row: _infer_schema(row, 
names)).reduce(_merge_type)
      File "/Users/gberger/Projects/spark/python/pyspark/rdd.py", line 852, in 
reduce
        return reduce(f, vals)
      File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line 
1133, in _merge_type
        for f in a.fields]
      File "/Users/gberger/Projects/spark/python/pyspark/sql/types.py", line 
1126, in _merge_type
        raise TypeError(new_msg("Can not merge type %s and %s" % (type(a), 
type(b))))
    TypeError: field a: Can not merge type <class 
'pyspark.sql.types.StringType'> and <class 'pyspark.sql.types.LongType'>
    ```
    
    Also, with this last change, I could simplify the code in `_createFromRDD`. 
Since I pass the field names down to `_inferSchema` (and to `_infer_schema` 
from there), the inferred schema already comes with field names, so no need to 
set them again in `_createFromRDD`. Tests for this still pass. Let me know if 
you can think of any edge case not covered by tests that would break.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...

Reply via email to