[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-05-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549995#comment-14549995
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/566


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547920#comment-14547920
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/566#issuecomment-103038656
  
Thanks for the update! I'll have a look at it soon.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548619#comment-14548619
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r30540219
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/ByteParser.java ---
@@ -21,22 +21,23 @@
 
 
 public class ByteParser extends FieldParserByte {
-   
--- End diff --

Please avoid such reformatting changes in the future.
In this file less than 10 changed lines make up the actual change but 
almost 80 lines are touched. This costs quite a bit of time to review.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549442#comment-14549442
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/566#issuecomment-103240648
  
Looks good. I'll adapt a few things and will merge the PR tomorrow.
Thanks for improving the CSV parsers and making them more consistent.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514099#comment-14514099
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r29144391
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java ---
@@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, I
 
final int delimLimit = limit-delimiter.length+1;
 
+   if (bytes.length == 0) {
--- End diff --

This check is not strictly necessary, IMO.
`bytes` is a larger byte array which is reused by the calling 
`GenericCsvInputFormat`.

To reduce the processing overhead of each field, I would omit the check 
(here and in the Long and Short parsers)


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514102#comment-14514102
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/566#issuecomment-96647676
  
Looks good. I added few minor comments inline. 
Did you check if the changes should also go into the `ByteParser`?


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514122#comment-14514122
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user FelixNeutatz commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r29145755
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java ---
@@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, I
 
final int delimLimit = limit-delimiter.length+1;
 
+   if (bytes.length == 0) {
--- End diff --

If I skip this check - the LongParserTest, ShortParserTest  ... will fail 
because of an out-of-bound-exception ...


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514093#comment-14514093
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r29143814
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java ---
@@ -102,6 +111,10 @@ public static final float parseField(byte[] bytes, int 
startPos, int length, cha
}

String str = new String(bytes, startPos, i);
+   int len = str.length();
+   if(len  str.trim().length()) {
--- End diff --

See other comment on `String.trim()`


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514092#comment-14514092
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r29143809
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java ---
@@ -41,6 +41,15 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, F
}

String str = new String(bytes, startPos, i-startPos);
+   int len = str.length();
+   if (len == 0) {
+   setErrorState(ParseErrorState.EMPTY_STRING);
+   return -1;
+   }
+   if(len  str.trim().length()) {
--- End diff --

See other comment on `String.trim()`


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514091#comment-14514091
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r29143800
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java ---
@@ -42,6 +42,15 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, D
}

String str = new String(bytes, startPos, i-startPos);
+   int len = str.length();
+   if (len == 0) {
+   setErrorState(ParseErrorState.EMPTY_STRING);
+   return -1;
+   }
+   if(len  str.trim().length()) {
--- End diff --

See other comment on `String.trim()`


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514087#comment-14514087
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r29143622
  
--- Diff: 
flink-java/src/test/java/org/apache/flink/api/java/io/CsvInputFormatTest.java 
---
@@ -353,6 +354,99 @@ public void testIntegerFieldsl() throws IOException {
assertEquals(Integer.valueOf(888), result.f2);
assertEquals(Integer.valueOf(999), result.f3);
assertEquals(Integer.valueOf(000), result.f4);
+
+   result = format.nextRecord(result);
+   assertNull(result);
+   assertTrue(format.reachedEnd());
+   }
+   catch (Exception ex) {
+   fail(Test failed due to a  + ex.getClass().getName() 
+ :  + ex.getMessage());
+   }
+   }
+
+   @Test
+   public void testEmptyFields() throws IOException {
+   try {
+   final String fileContent = |0|0|0|0\n +
+   1||1|1|1|\n +
+   2|2| |2|2|\n +
+   3 |3|3|  |3|\n +
+   4|4|4|4| |\n;
+   final FileInputSplit split = 
createTempFile(fileContent);
+
+   final TupleTypeInfoTuple5Short, Integer, Long, Float, 
Double typeInfo =
+   
TupleTypeInfo.getBasicTupleTypeInfo(Short.class, Integer.class, Long.class, 
Float.class, Double.class);
+   final CsvInputFormatTuple5Short, Integer, Long, 
Float, Double format = new CsvInputFormatTuple5Short, Integer, Long, Float, 
Double(PATH, typeInfo);
+
+   format.setFieldDelimiter(|);
+
+   format.configure(new Configuration());
+   format.open(split);
+
+   Tuple5Short, Integer, Long, Float, Double result = 
new Tuple5Short, Integer, Long, Float, Double();
+
+   try {
+   result = format.nextRecord(result);
+   fail(Empty String Parse Exception was not 
thrown! (ShortParser));
+   } catch (ParseException e) {}
+   try {
+   result = format.nextRecord(result);
+   fail(Empty String Parse Exception was not 
thrown! (IntegerParser));
+   } catch (ParseException e) {}
+   try {
+   result = format.nextRecord(result);
+   fail(Empty String Parse Exception was not 
thrown! (LongParser));
+   } catch (ParseException e) {}
+   try {
+   result = format.nextRecord(result);
--- End diff --

Doesn't this call fail because of the tailing whitespace in the `short` 
field?


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514090#comment-14514090
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r29143763
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java ---
@@ -103,6 +112,10 @@ public static final double parseField(byte[] bytes, 
int startPos, int length, ch
}

String str = new String(bytes, startPos, i);
+   int len = str.length();
+   if(len  str.trim().length()) {
--- End diff --

`String.trim()` creates a new String object. 
Checking if the first or last character of the String is a whitespace is 
probably more efficient.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514171#comment-14514171
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user FelixNeutatz commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r29148302
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java ---
@@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, I
 
final int delimLimit = limit-delimiter.length+1;
 
+   if (bytes.length == 0) {
--- End diff --

Sounds good (y)


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514153#comment-14514153
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r29147205
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/IntParser.java ---
@@ -38,19 +38,28 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, I
 
final int delimLimit = limit-delimiter.length+1;
 
+   if (bytes.length == 0) {
--- End diff --

I see. This is probably because an empty test string causes the test to 
call the parser with an 0-length array.
We could add a dedicated `testEmptyField` test method to the 
`ParserTestBase` and remove the empty Strings from the set of invalid inputs.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504687#comment-14504687
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/566#issuecomment-94722161
  
Your PR changes the semantics of the Integer parsers a bit because you 
ignore whitespaces. This change has a few implications. The following fields 
are parsed as correct Integer values:
- `  123  `
- `- 123`
- `1 2 3`

but the following is not accepted:
- ` -123`

This behavior is not expected, IMO.

I know that `Double.parseDouble()` and `Float.parseFloat()` both ignore 
leading and tailing white spaces and the intention of this PR is to make the 
parsing of floating point and integer numeric values consistent.

Instead of accepting leading and tailing white space in the Integer 
parsers, I propose to check for leading and tailing whitespaces in floating 
point fields and make these parsers fail in such cases. This would also give 
consistent parsing behavior.

What do you think?



 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504690#comment-14504690
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/566#issuecomment-94722728
  
It would also be good to extend the respective parser tests such as 
`IntParserTest` when changing the behavior and semantics of the parsers.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505005#comment-14505005
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user FelixNeutatz commented on the pull request:

https://github.com/apache/flink/pull/566#issuecomment-94810290
  
@fhueske: I agree on that :)


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500181#comment-14500181
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r28610833
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java ---
@@ -42,6 +42,10 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, D
}

String str = new String(bytes, startPos, i-startPos);
+   if (str.length() == 0) {
--- End diff --

remove whitespaces with `String.trim()` before checking for `length() == 0`.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500183#comment-14500183
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r28610838
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java ---
@@ -41,6 +41,10 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, F
}

String str = new String(bytes, startPos, i-startPos);
+   if (str.length() == 0) {
--- End diff --

remove whitespaces with `String.trim()` before checking for `length() == 0`.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500172#comment-14500172
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r28610548
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/DoubleParser.java ---
@@ -49,7 +53,7 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, D
catch (NumberFormatException e) {

setErrorState(ParseErrorState.NUMERIC_VALUE_FORMAT_ERROR);
return -1;
-   }
+   }   
--- End diff --

Whitespace added


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500173#comment-14500173
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r28610554
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/FloatParser.java ---
@@ -48,7 +52,7 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, F
catch (NumberFormatException e) {

setErrorState(ParseErrorState.NUMERIC_VALUE_FORMAT_ERROR);
return -1;
-   }
+   }   
}
--- End diff --

Whitespace


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500175#comment-14500175
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r28610575
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/LongParser.java ---
@@ -33,6 +33,19 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, L
boolean neg = false;
 
final int delimLimit = limit - delimiter.length + 1;
+
+   int delimCount = 0;
+   for (int i = 0; i  delimiter.length; i++) {
+   if (bytes[startPos + i] == delimiter[i]) {
+   delimCount++;
+   } else {
+   break;
+   }
+   }
+   if (delimCount == delimiter.length) {
+   setErrorState(ParseErrorState.EMPTY_STRING);
+   return -1;
+   }
--- End diff --

Same as above.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500176#comment-14500176
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/566#discussion_r28610577
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/types/parser/ShortParser.java ---
@@ -36,7 +36,20 @@ public int parseField(byte[] bytes, int startPos, int 
limit, byte[] delimiter, S
int val = 0;
boolean neg = false;
 
-   final int delimLimit = limit-delimiter.length+1;
+   final int delimLimit = limit-delimiter.length + 1;
+
+   int delimCount = 0;
+   for (int i = 0; i  delimiter.length; i++) {
+   if (bytes[startPos + i] == delimiter[i]) {
+   delimCount++;
+   } else {
+   break;
+   }
+   }
+   if (delimCount == delimiter.length) {
+   setErrorState(ParseErrorState.EMPTY_STRING);
+   return -1;
+   }
--- End diff --

Same as above.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-07 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483215#comment-14483215
 ] 

Stephan Ewen commented on FLINK-1820:
-

Given the discussion here, I agree with Fabian. An exception is easier to 
understand than strange results because of a sneaky zero inserted for empty 
fields...

 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-07 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483253#comment-14483253
 ] 

Maximilian Michels commented on FLINK-1820:
---

+1 for a exception message for an empty String

 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-07 Thread Felix Neutatz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483973#comment-14483973
 ] 

Felix Neutatz commented on FLINK-1820:
--

ok, I agree with you. I will add the exceptions instead

 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481167#comment-14481167
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/566#issuecomment-90038380
  
It seems like a good addition.

Is that standard CSV style to interpret it like this?


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481172#comment-14481172
 ] 

ASF GitHub Bot commented on FLINK-1820:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/566#issuecomment-90039531
  
The discussion was continued in JIRA.


 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-03 Thread Felix Neutatz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394886#comment-14394886
 ] 

Felix Neutatz commented on FLINK-1820:
--

I think we should handle all types equally. Either we interpret empty strings 
as 0 like in the case of Long and Integer or we throw an exception like in the 
case of Double and Float. 

The third option would be to assign null to these values.

Moreover I am currently working with the TPC-DS benchmark. In my opinion the 
CSVReader should be able to read the corresponding input files.

 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-03 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394853#comment-14394853
 ] 

Fabian Hueske commented on FLINK-1820:
--

I would not call this a bug. 
The behavior that you (or your program) expects might be different from what 
other users would like the parsers to behave like. I would find it surprising 
that an empty string results in value 0 (why not -1 or 42?) and rather expect 
either an exception or a NaN value.
Also changing the default behavior breaks the API (other users might rely on 
the current behavior).

I am also not sure, if we should add another parameter to the CsvInputFormats 
to configure the floating point parsers. The formats have already quite a few 
parameters and I think it is not a good idea to add more parameters for all 
possible parser behaviors. Instead, we could allow to configure user-defined 
parsers for specific fields.

A workaround for your usecase could be to read the possible empty field as a 
String field and convert the String to a Double or Float in a subsequent Mapper.

 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-03 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394954#comment-14394954
 ] 

Fabian Hueske commented on FLINK-1820:
--

That would be the proper way to handle this in my opinion.
But lets maybe wait for some other opinions on this. 

Its an API breaking change after all ;-)

We can also create another issue for the configurable user-defined field 
parsers. 
Would you be interested in adding that feature?

 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1820) Bug in DoubleParser and FloatParser - empty String is not casted to 0

2015-04-03 Thread Felix Neutatz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394935#comment-14394935
 ] 

Felix Neutatz commented on FLINK-1820:
--

So I will add the Exception in the Integer/Long Parser?

 Bug in DoubleParser and FloatParser - empty String is not casted to 0
 -

 Key: FLINK-1820
 URL: https://issues.apache.org/jira/browse/FLINK-1820
 Project: Flink
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.0, 0.9, 0.8.1
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Critical
 Fix For: 0.9


 Hi,
 I found the bug, when I wanted to read a csv file, which had a line like:
 ||\n
 If I treat it as a Tuple2Long,Long, I get as expected a tuple (0L,0L).
 But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
 following error:
 java.lang.AssertionError: Test failed due to a 
 org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
 ParserError NUMERIC_VALUE_FORMAT_ERROR 
 This error can be solved by adding an additional condition for empty strings 
 in the FloatParser / DoubleParser.
 We definitely need the CSVReader to be able to read empty values.
 I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)