Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21892
@jbax I got the following exception on **2.7.3-SNAPSHOT** (commit
e51b0958a):
```
Internal state when error was thrown: line=20, column=20481, record=20,
charIndex=82594, headers=[col0,..., col999]
at
com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:369)
at
com.univocity.parsers.common.AbstractParser.parseLine(AbstractParser.java:673)
at
org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:210)
at
org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$7.apply(UnivocityParser.scala:333)
...
Caused by: java.lang.ArrayIndexOutOfBoundsException: 20480
at
com.univocity.parsers.common.ParserOutput.valueParsed(ParserOutput.java:316)
at com.univocity.parsers.csv.CsvParser.parseRecord(CsvParser.java:160)
at
com.univocity.parsers.common.AbstractParser.parseLine(AbstractParser.java:654)
... 23 more
```
This happened on a CSV file with 1000 columns with header and the set of
selected indexes is empty. Our settings are:
```
Parser Configuration: CsvParserSettings:
Auto configuration enabled=true
Autodetect column delimiter=false
Autodetect quotes=false
Column reordering enabled=true
Delimiters for detection=null
Empty value=
Escape unquoted values=false
Header extraction enabled=null
Headers=null
Ignore leading whitespaces=false
Ignore leading whitespaces in quotes=false
Ignore trailing whitespaces=false
Ignore trailing whitespaces in quotes=false
Input buffer size=128
Input reading on separate thread=false
Keep escape sequences=false
Keep quotes=false
Length of content displayed on error=-1
Line separator detection enabled=false
Maximum number of characters per column=-1
Maximum number of columns=20480
Normalize escaped line separators=true
Null value=
Number of records to read=all
Processor=none
Restricting data in exceptions=false
RowProcessor error handler=null
Selected fields=field selection: []
Skip bits as whitespace=true
Skip empty lines=true
Unescaped quote handling=STOP_AT_DELIMITERFormat configuration:
CsvFormat:
Comment character=\0
Field delimiter=,
Line separator (normalized)=\n
Line separator sequence=\n
Quote character="
Quote escape character=\
Quote escape escape character=null
```
Here is the input file (3.5GB uncompressed) - test.csv.xz (you need to
change extension):
[test.csv.zip](https://github.com/apache/spark/files/2246796/test.csv.zip)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]