[ https://issues.apache.org/jira/browse/FLINK-13589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995659#comment-16995659 ]
Arvid Heise commented on FLINK-13589: ------------------------------------- Thank you very much for the bug report and the patch. I will incorporate it for the upcoming release. > DelimitedInputFormat index error on multi-byte delimiters with whole file > input splits > -------------------------------------------------------------------------------------- > > Key: FLINK-13589 > URL: https://issues.apache.org/jira/browse/FLINK-13589 > Project: Flink > Issue Type: Bug > Components: Connectors / FileSystem, Formats (JSON, Avro, Parquet, > ORC, SequenceFile) > Affects Versions: 1.8.1 > Reporter: Adric Eckstein > Priority: Blocker > Fix For: 1.9.2, 1.10.0 > > Attachments: delimiter-bug.patch > > > The DelimitedInputFormat can drops bytes when using input splits that have a > length of -1 (for reading the whole file). It looks like this is a simple > bug in handing the delimiter on buffer boundaries where the logic is > inconsistent for different split types. > Attached is a possible patch with fix and test. > -- This message was sent by Atlassian Jira (v8.3.4#803005)