Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

Jörn Franke Wed, 07 Jun 2017 09:45:43 -0700

Spark CSV data source should be able


> On 7. Jun 2017, at 17:50, Chanh Le <giaosu...@gmail.com> wrote:
> 
> Hi everyone,
> I am using Spark 2.1.1 to read csv files and convert to avro files.
> One problem that I am facing is if one row of csv file has more columns than 
> maxColumns (default is 20480). The process of parsing was stop.
> 
> Internal state when error was thrown: line=1, column=3, record=0, charIndex=12
> com.univocity.parsers.common.TextParsingException: 
> java.lang.ArrayIndexOutOfBoundsException - 2
> Hint: Number of columns processed may have exceeded limit of 2 columns. Use 
> settings.setMaxColumns(int) to define the maximum number of columns your 
> input can have
> Ensure your configuration is correct, with delimiters, quotes and escape 
> sequences that match the input format you are trying to parse
> Parser Configuration: CsvParserSettings:
> 
> 
> I did some investigation in univocity library but the way it handle is throw 
> error that why spark stop the process.
> 
> How to skip the invalid row and just continue to parse next valid one?
> Any libs can replace univocity in that job?
> 
> Thanks & regards,
> Chanh
> -- 
> Regards,
> Chanh

Re: [CSV] If number of columns of one row bigger than maxcolumns it stop the whole parsing process.

Reply via email to