Re: Encoding Problem: Kafka - DataFlow

Leonardo Campos | GameDuell Mon, 04 Nov 2019 01:41:36 -0800

Thanks, Eddie.

Just to add to the discussion, I logged the following information:
Charset.defaultCharset(): US-ASCII
System.getProperty("file.encoding"): ANSI_X3.4-1968

OutputStreamWriter writer = new OutputStreamWriter(newByteArrayOutputStream()); writer..getEncoding(): ASCII

In our case, a Json library seems to be messing things up, as just onfirst glance I already found in its internals a string.getBytes()without the possibility to inform the encoding.


I really wonder if there is any way to change this default in DataFlow.

Cheers

On 04.11.2019 09:58, Eddy G wrote:

Adding to what Jeff just pointed out previously I'm dealing with the
same issue writing Parquet files using the ParquetIO module in
Dataflow and same stuff happens, even forcing all String objects with
UTF-8. Maybe it is related to behind the scenes decoding/encoding
within the previously mentioned module which causes those chars to be

wrongly encoded in the output, just in case you are doing someParquet

processing or using any other module in the end which may have a
similar behavior.

Re: Encoding Problem: Kafka - DataFlow

Reply via email to