Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188966754
--- Diff: python/pyspark/sql/tests.py ---
@@ -3040,6 +3040,24 @@ def test_csv_sampling_ratio(self):
.csv(rdd, samplingRatio=0.5).schema
self.assertEquals(schema, StructType([StructField("_c0",
IntegerType(), True)]))
+ def test_checking_csv_header(self):
+ tmpPath = tempfile.mkdtemp()
+ shutil.rmtree(tmpPath)
+ try:
+ self.spark.createDataFrame([[1, 1000], [2000, 2]])\
+ .toDF('f1', 'f2').write.option("header",
"true").csv(tmpPath)
+ schema = StructType([
+ StructField('f2', IntegerType(), nullable=True),
+ StructField('f1', IntegerType(), nullable=True)])
+ df = self.spark.read.option('header', 'true').schema(schema)\
+ .csv(tmpPath, enforceSchema=False)
+ self.assertRaisesRegexp(
+ Exception,
+ "CSV file header does not contain the expected fields",
--- End diff --
eh, I have already changed the error message as @hvanhovell suggested. What
about:
```
CSV header is not conform to the schema
```
So, error message will look like:
```
java.lang.IllegalArgumentException: CSV header is not conform to the schema
Header: depth, temperature
Schema: temperature, depth
CSV file: marina.csv
```
Is it ok for you?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]