[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549995#comment-14549995 ] ASF GitHub Bot commented on FLINK-1820: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/566 Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547920#comment-14547920 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/566#issuecomment-103038656 Thanks for the update! I'll have a look at it soon. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548619#comment-14548619 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r30540219 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/ByteParser.java --- @@ -21,22 +21,23 @@ public class ByteParser extends FieldParserByte { - --- End diff -- Please avoid such reformatting changes in the future. In this file less than 10 changed lines make up the actual change but almost 80 lines are touched. This costs quite a bit of time to review. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549442#comment-14549442 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/566#issuecomment-103240648 Looks good. I'll adapt a few things and will merge the PR tomorrow. Thanks for improving the CSV parsers and making them more consistent. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514099#comment-14514099 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29144391 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java --- @@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, I final int delimLimit = limit-delimiter.length+1; + if (bytes.length == 0) { --- End diff -- This check is not strictly necessary, IMO. `bytes` is a larger byte array which is reused by the calling `GenericCsvInputFormat`. To reduce the processing overhead of each field, I would omit the check (here and in the Long and Short parsers) Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514102#comment-14514102 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/566#issuecomment-96647676 Looks good. I added few minor comments inline. Did you check if the changes should also go into the `ByteParser`? Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514122#comment-14514122 ] ASF GitHub Bot commented on FLINK-1820: --- Github user FelixNeutatz commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29145755 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java --- @@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, I final int delimLimit = limit-delimiter.length+1; + if (bytes.length == 0) { --- End diff -- If I skip this check - the LongParserTest, ShortParserTest ... will fail because of an out-of-bound-exception ... Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514093#comment-14514093 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143814 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java --- @@ -102,6 +111,10 @@ public static final float parseField(byte[] bytes, int startPos, int length, cha } String str = new String(bytes, startPos, i); + int len = str.length(); + if(len str.trim().length()) { --- End diff -- See other comment on `String.trim()` Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514092#comment-14514092 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143809 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java --- @@ -41,6 +41,15 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, F } String str = new String(bytes, startPos, i-startPos); + int len = str.length(); + if (len == 0) { + setErrorState(ParseErrorState.EMPTY_STRING); + return -1; + } + if(len str.trim().length()) { --- End diff -- See other comment on `String.trim()` Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514091#comment-14514091 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143800 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java --- @@ -42,6 +42,15 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, D } String str = new String(bytes, startPos, i-startPos); + int len = str.length(); + if (len == 0) { + setErrorState(ParseErrorState.EMPTY_STRING); + return -1; + } + if(len str.trim().length()) { --- End diff -- See other comment on `String.trim()` Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514087#comment-14514087 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143622 --- Diff: flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java --- @@ -353,6 +354,99 @@ public void testIntegerFieldsl() throws IOException { assertEquals(Integer.valueOf(888), result.f2); assertEquals(Integer.valueOf(999), result.f3); assertEquals(Integer.valueOf(000), result.f4); + + result = format.nextRecord(result); + assertNull(result); + assertTrue(format.reachedEnd()); + } + catch (Exception ex) { + fail(Test failed due to a + ex.getClass().getName() + : + ex.getMessage()); + } + } + + @Test + public void testEmptyFields() throws IOException { + try { + final String fileContent = |0|0|0|0\n + + 1||1|1|1|\n + + 2|2| |2|2|\n + + 3 |3|3| |3|\n + + 4|4|4|4| |\n; + final FileInputSplit split = createTempFile(fileContent); + + final TupleTypeInfoTuple5Short, Integer, Long, Float, Double typeInfo = + TupleTypeInfo.getBasicTupleTypeInfo(Short.class, Integer.class, Long.class, Float.class, Double.class); + final CsvInputFormatTuple5Short, Integer, Long, Float, Double format = new CsvInputFormatTuple5Short, Integer, Long, Float, Double(PATH, typeInfo); + + format.setFieldDelimiter(|); + + format.configure(new Configuration()); + format.open(split); + + Tuple5Short, Integer, Long, Float, Double result = new Tuple5Short, Integer, Long, Float, Double(); + + try { + result = format.nextRecord(result); + fail(Empty String Parse Exception was not thrown! (ShortParser)); + } catch (ParseException e) {} + try { + result = format.nextRecord(result); + fail(Empty String Parse Exception was not thrown! (IntegerParser)); + } catch (ParseException e) {} + try { + result = format.nextRecord(result); + fail(Empty String Parse Exception was not thrown! (LongParser)); + } catch (ParseException e) {} + try { + result = format.nextRecord(result); --- End diff -- Doesn't this call fail because of the tailing whitespace in the `short` field? Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514090#comment-14514090 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29143763 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java --- @@ -103,6 +112,10 @@ public static final double parseField(byte[] bytes, int startPos, int length, ch } String str = new String(bytes, startPos, i); + int len = str.length(); + if(len str.trim().length()) { --- End diff -- `String.trim()` creates a new String object. Checking if the first or last character of the String is a whitespace is probably more efficient. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514171#comment-14514171 ] ASF GitHub Bot commented on FLINK-1820: --- Github user FelixNeutatz commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29148302 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java --- @@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, I final int delimLimit = limit-delimiter.length+1; + if (bytes.length == 0) { --- End diff -- Sounds good (y) Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514153#comment-14514153 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r29147205 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java --- @@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, I final int delimLimit = limit-delimiter.length+1; + if (bytes.length == 0) { --- End diff -- I see. This is probably because an empty test string causes the test to call the parser with an 0-length array. We could add a dedicated `testEmptyField` test method to the `ParserTestBase` and remove the empty Strings from the set of invalid inputs. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504687#comment-14504687 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/566#issuecomment-94722161 Your PR changes the semantics of the Integer parsers a bit because you ignore whitespaces. This change has a few implications. The following fields are parsed as correct Integer values: - ` 123 ` - `- 123` - `1 2 3` but the following is not accepted: - ` -123` This behavior is not expected, IMO. I know that `Double.parseDouble()` and `Float.parseFloat()` both ignore leading and tailing white spaces and the intention of this PR is to make the parsing of floating point and integer numeric values consistent. Instead of accepting leading and tailing white space in the Integer parsers, I propose to check for leading and tailing whitespaces in floating point fields and make these parsers fail in such cases. This would also give consistent parsing behavior. What do you think? Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504690#comment-14504690 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/566#issuecomment-94722728 It would also be good to extend the respective parser tests such as `IntParserTest` when changing the behavior and semantics of the parsers. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505005#comment-14505005 ] ASF GitHub Bot commented on FLINK-1820: --- Github user FelixNeutatz commented on the pull request: https://github.com/apache/flink/pull/566#issuecomment-94810290 @fhueske: I agree on that :) Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500181#comment-14500181 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r28610833 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java --- @@ -42,6 +42,10 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, D } String str = new String(bytes, startPos, i-startPos); + if (str.length() == 0) { --- End diff -- remove whitespaces with `String.trim()` before checking for `length() == 0`. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500183#comment-14500183 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r28610838 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java --- @@ -41,6 +41,10 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, F } String str = new String(bytes, startPos, i-startPos); + if (str.length() == 0) { --- End diff -- remove whitespaces with `String.trim()` before checking for `length() == 0`. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500172#comment-14500172 ] ASF GitHub Bot commented on FLINK-1820: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r28610548 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java --- @@ -49,7 +53,7 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, D catch (NumberFormatException e) { setErrorState(ParseErrorState.NUMERIC_VALUE_FORMAT_ERROR); return -1; - } + } --- End diff -- Whitespace added Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500173#comment-14500173 ] ASF GitHub Bot commented on FLINK-1820: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r28610554 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java --- @@ -48,7 +52,7 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, F catch (NumberFormatException e) { setErrorState(ParseErrorState.NUMERIC_VALUE_FORMAT_ERROR); return -1; - } + } } --- End diff -- Whitespace Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500175#comment-14500175 ] ASF GitHub Bot commented on FLINK-1820: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r28610575 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/LongParser.java --- @@ -33,6 +33,19 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, L boolean neg = false; final int delimLimit = limit - delimiter.length + 1; + + int delimCount = 0; + for (int i = 0; i delimiter.length; i++) { + if (bytes[startPos + i] == delimiter[i]) { + delimCount++; + } else { + break; + } + } + if (delimCount == delimiter.length) { + setErrorState(ParseErrorState.EMPTY_STRING); + return -1; + } --- End diff -- Same as above. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500176#comment-14500176 ] ASF GitHub Bot commented on FLINK-1820: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/566#discussion_r28610577 --- Diff: flink-core/src/main/java/org/apache/flink/types/parser/ShortParser.java --- @@ -36,7 +36,20 @@ public int parseField(byte[] bytes, int startPos, int limit, byte[] delimiter, S int val = 0; boolean neg = false; - final int delimLimit = limit-delimiter.length+1; + final int delimLimit = limit-delimiter.length + 1; + + int delimCount = 0; + for (int i = 0; i delimiter.length; i++) { + if (bytes[startPos + i] == delimiter[i]) { + delimCount++; + } else { + break; + } + } + if (delimCount == delimiter.length) { + setErrorState(ParseErrorState.EMPTY_STRING); + return -1; + } --- End diff -- Same as above. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483215#comment-14483215 ] Stephan Ewen commented on FLINK-1820: - Given the discussion here, I agree with Fabian. An exception is easier to understand than strange results because of a sneaky zero inserted for empty fields... Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483253#comment-14483253 ] Maximilian Michels commented on FLINK-1820: --- +1 for a exception message for an empty String Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483973#comment-14483973 ] Felix Neutatz commented on FLINK-1820: -- ok, I agree with you. I will add the exceptions instead Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481167#comment-14481167 ] ASF GitHub Bot commented on FLINK-1820: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/566#issuecomment-90038380 It seems like a good addition. Is that standard CSV style to interpret it like this? Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481172#comment-14481172 ] ASF GitHub Bot commented on FLINK-1820: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/566#issuecomment-90039531 The discussion was continued in JIRA. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394886#comment-14394886 ] Felix Neutatz commented on FLINK-1820: -- I think we should handle all types equally. Either we interpret empty strings as 0 like in the case of Long and Integer or we throw an exception like in the case of Double and Float. The third option would be to assign null to these values. Moreover I am currently working with the TPC-DS benchmark. In my opinion the CSVReader should be able to read the corresponding input files. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394853#comment-14394853 ] Fabian Hueske commented on FLINK-1820: -- I would not call this a bug. The behavior that you (or your program) expects might be different from what other users would like the parsers to behave like. I would find it surprising that an empty string results in value 0 (why not -1 or 42?) and rather expect either an exception or a NaN value. Also changing the default behavior breaks the API (other users might rely on the current behavior). I am also not sure, if we should add another parameter to the CsvInputFormats to configure the floating point parsers. The formats have already quite a few parameters and I think it is not a good idea to add more parameters for all possible parser behaviors. Instead, we could allow to configure user-defined parsers for specific fields. A workaround for your usecase could be to read the possible empty field as a String field and convert the String to a Double or Float in a subsequent Mapper. Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394954#comment-14394954 ] Fabian Hueske commented on FLINK-1820: -- That would be the proper way to handle this in my opinion. But lets maybe wait for some other opinions on this. Its an API breaking change after all ;-) We can also create another issue for the configurable user-defined field parsers. Would you be interested in adding that feature? Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0
[ https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394935#comment-14394935 ] Felix Neutatz commented on FLINK-1820: -- So I will add the Exception in the Integer/Long Parser? Bug in DoubleParser and FloatParser - empty String is not casted to 0 - Key: FLINK-1820 URL: https://issues.apache.org/jira/browse/FLINK-1820 Project: Flink Issue Type: Bug Components: core Affects Versions: 0.8.0, 0.9, 0.8.1 Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Critical Fix For: 0.9 Hi, I found the bug, when I wanted to read a csv file, which had a line like: ||\n If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L). But if I want to read it into a Double-Tuple or a Float-Tuple, I get the following error: java.lang.AssertionError: Test failed due to a org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||' ParserError NUMERIC_VALUE_FORMAT_ERROR This error can be solved by adding an additional condition for empty strings in the FloatParser / DoubleParser. We definitely need the CSVReader to be able to read empty values. I can fix it like described if there are no better ideas :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)