Hi Malte,

Typically, double quotes are used to identify strings and thus are not
interpreted literally. Any data in a field after a double quoted string is
regarded as invalid trailing data.

You could replace double quotes with single quotes:

A|ggg
B|'hhh' xx
C|xxx

This results in the expected >'hhh' xx< for the second line.

Best regards,
Max

On Fri, Dec 5, 2014 at 4:44 PM, Malte Schwarzer <[email protected]> wrote:

> Hi Stephan,
>
> The result should be >"hhh“ xx<  as field value. Enclosures should be
> disabled but there seems to be no method to do that.
>
>
> Malte
>
> Von: Stephan Ewen <[email protected]>
> Antworten an: <[email protected]>
> Datum: Freitag, 5. Dezember 2014 16:28
> An: <[email protected]>
> Betreff: Re: Quotes in fields of CsvInputFormat
>
> Hi!
>
> The parser interprets the quotes as quotes for the field. That means the
> second field (the string) stops after the "hhh" and the xx is considered
> invalid trailing data.
>
> What do you expect as the result of parsing that line?
>
> Stephan
>
>
> On Fri, Dec 5, 2014 at 4:16 PM, Malte Schwarzer <[email protected]> wrote:
>
>> Hi,
>>
>> I’m try to import a CSV file but the parser seems to have problems this
>> quotes in the beginning of a field. Is there a way to set or disable
>> enclosures for the CSV input?
>>
>> This is my  code:
>>
>> DataSet<Tuple2<String, String>> res = env.readCsvFile(inputCsvFilename)
>>                 .fieldDelimiter('|')
>>                 .types(String.class, String.class)
>>
>> CSV:
>>
>> A|ggg
>> B|"hhh" xx
>> C|xxx
>>
>> As result I’m receiving a ParserException for line B:
>>
>> *org.apache.flink.api.common.io.ParseException: Line could not be parsed:
>> 'B|"hhh" xx**‘*
>>
>>
>> Thanks,
>> Malte
>>
>
>

Reply via email to