[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

MaxGekk Thu, 17 May 2018 06:56:38 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20894#discussion_r188966754
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3040,6 +3040,24 @@ def test_csv_sampling_ratio(self):
                 .csv(rdd, samplingRatio=0.5).schema
             self.assertEquals(schema, StructType([StructField("_c0", 
IntegerType(), True)]))
     
    +    def test_checking_csv_header(self):
    +        tmpPath = tempfile.mkdtemp()
    +        shutil.rmtree(tmpPath)
    +        try:
    +            self.spark.createDataFrame([[1, 1000], [2000, 2]])\
    +                .toDF('f1', 'f2').write.option("header", 
"true").csv(tmpPath)
    +            schema = StructType([
    +                StructField('f2', IntegerType(), nullable=True),
    +                StructField('f1', IntegerType(), nullable=True)])
    +            df = self.spark.read.option('header', 'true').schema(schema)\
    +                .csv(tmpPath, enforceSchema=False)
    +            self.assertRaisesRegexp(
    +                Exception,
    +                "CSV file header does not contain the expected fields",
    --- End diff --
    
    eh, I have already changed the error message as @hvanhovell suggested. What 
about:
    ```
    CSV header is not conform to the schema
    ```
    So, error message will look like:
    ```
    java.lang.IllegalArgumentException: CSV header is not conform to the schema
     Header: depth, temperature
     Schema: temperature, depth
    CSV file: marina.csv
    ```
    Is it ok for you?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

Reply via email to