Re: UTF-16 support for TextInputFormat

2018-08-21 Thread Fabian Hueske
Thanks for creating FLINK-10134 and adding your suggestions! Best, Fabian 2018-08-13 23:55 GMT+02:00 David Dreyfus : > Hi Fabian, > > I've added FLINK-10134. FLINK-10134 > . I'm not sure you'd > consider it a blocker or that I've identified the

Re: UTF-16 support for TextInputFormat

2018-08-13 Thread David Dreyfus
Hi Fabian, I've added FLINK-10134. FLINK-10134 . I'm not sure you'd consider it a blocker or that I've identified the right component. I'm afraid I don't have the bandwidth or knowledge to make the kind of pull request you really need. I do hope

Re: UTF-16 support for TextInputFormat

2018-08-10 Thread Fabian Hueske
Hi David, Thanks for digging into the code! I had a quick look into the classes as well. As far as I can see, your analysis is correct and the BOM handling in DelimitedInputFormat and TextInputFormat (and other text-based IFs such as CsvInputFormat) is broken. In fact, its obvious that nobody

Re: UTF-16 support for TextInputFormat

2018-08-09 Thread David Dreyfus
Hi Fabian, Thank you for taking my email. TextInputFormat.setCharsetName("UTF-16") appears to set the private variable TextInputFormat.charsetName. It doesn't appear to cause additional behavior that would help interpret UTF-16 data. The method I've tested is calling

Re: UTF-16 support for TextInputFormat

2018-08-09 Thread Fabian Hueske
Hi David, Did you try to set the encoding on the TextInputFormat with TextInputFormat tif = ... tif.setCharsetName("UTF-16"); Best, Fabian 2018-08-08 17:45 GMT+02:00 David Dreyfus : > Hello - > > It does not appear that Flink supports a charset encoding of "UTF-16". It > particular, it