Other options are maybe :
- "spark.sql.files.ignoreCorruptFiles" option
- DataFrameReader.csv(csvDataset: Dataset[String]) with custom inputformat
(this is available from Spark 2.2.0).
For example,
val rdd = spark.sparkContext.newAPIHadoopFile("/tmp/abcd",
classOf[org.apache.hadoop.mapreduce.
Hi,
The Spark CSV parser has different parsing modes:
* permissive (default) tries to read everything and missing tokens are
interpreted as null and extra tokens are ignored
* dropmalformed drops lines which have more or less tokens
* failfast - runtimexception if there is a malformed line
Obviou
Accidentally sent this to the dev mailing list, meant to send it here.
I have a spark java application that in the past has used the hadoopFile
interface to specify a custom TextInputFormat to be used when reading
files. This custom class would gracefully handle exceptions like EOF
exceptions cau