Spark CSV data source should be able
> On 7. Jun 2017, at 17:50, Chanh Le <giaosu...@gmail.com> wrote: > > Hi everyone, > I am using Spark 2.1.1 to read csv files and convert to avro files. > One problem that I am facing is if one row of csv file has more columns than > maxColumns (default is 20480). The process of parsing was stop. > > Internal state when error was thrown: line=1, column=3, record=0, charIndex=12 > com.univocity.parsers.common.TextParsingException: > java.lang.ArrayIndexOutOfBoundsException - 2 > Hint: Number of columns processed may have exceeded limit of 2 columns. Use > settings.setMaxColumns(int) to define the maximum number of columns your > input can have > Ensure your configuration is correct, with delimiters, quotes and escape > sequences that match the input format you are trying to parse > Parser Configuration: CsvParserSettings: > > > I did some investigation in univocity library but the way it handle is throw > error that why spark stop the process. > > How to skip the invalid row and just continue to parse next valid one? > Any libs can replace univocity in that job? > > Thanks & regards, > Chanh > -- > Regards, > Chanh