[GitHub] flink issue #2901: [FLINK-3921] StringParser encoding

2016-12-08 Thread greghogan
Github user greghogan commented on the issue: https://github.com/apache/flink/pull/2901 I'm not quite following but I think we have the same idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] flink issue #2901: [FLINK-3921] StringParser encoding

2016-12-08 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2901 I thought, I'll merge the original commit and squash yours on top. OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] flink issue #2901: [FLINK-3921] StringParser encoding

2016-12-08 Thread greghogan
Github user greghogan commented on the issue: https://github.com/apache/flink/pull/2901 I had planned to squash these into the original commit (#2060), but whatever you think is best. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] flink issue #2901: [FLINK-3921] StringParser encoding

2016-12-08 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2901 Merging (+ closing PR #2060 which is included in this one) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] flink issue #2901: [FLINK-3921] StringParser encoding

2016-12-06 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2901 The byte-level implementation as in `IntParser` was originally used to avoid String object instances. Such an implementation was not possible with Double because it led to very imprecise results.

[GitHub] flink issue #2901: [FLINK-3921] StringParser encoding

2016-12-06 Thread greghogan
Github user greghogan commented on the issue: https://github.com/apache/flink/pull/2901 @fhueske, the `FieldParser`s look to use two methods to parse their fields. `DoubleParser` uses `FieldParser.nextStringEndPos`, recently added by Timo, to create a `String` from the byte array

[GitHub] flink issue #2901: [FLINK-3921] StringParser encoding

2016-12-06 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2901 Thanks for the update @greghogan. PR looks good to merge. Something that came to my mind. Some of the FieldParsers such as `DoubleParser` create a String before converting it into a

[GitHub] flink issue #2901: [FLINK-3921] StringParser encoding

2016-12-06 Thread greghogan
Github user greghogan commented on the issue: https://github.com/apache/flink/pull/2901 @fhueske, I refactored `DelimitedInputFormat` and `GenericCsvInputFormat` to reinterpret the fields whenever the charset encoding changes, and also updated the test per your review. --- If your

[GitHub] flink issue #2901: [FLINK-3921] StringParser encoding

2016-12-05 Thread greghogan
Github user greghogan commented on the issue: https://github.com/apache/flink/pull/2901 @fhueske, please review the additional commits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] flink issue #2901: [FLINK-3921] StringParser encoding

2016-12-01 Thread fhueske
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2901 Hi @greghogan, thanks for working on this. The changes look good. However, I think we need to propagate the configured charset also to the `DelimitedInputFormat`. `DelimitedInputFormat` splits