The only combination that I can think of is to use this hack[1] combined
with a JvmInitialier[2].

1: https://stackoverflow.com/a/14987992/4368200
2:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java

On Mon, Nov 4, 2019 at 1:40 AM Leonardo Campos | GameDuell <
[email protected]> wrote:

> Thanks, Eddie.
>
> Just to add to the discussion, I logged the following information:
> Charset.defaultCharset(): US-ASCII
> System.getProperty("file.encoding"): ANSI_X3.4-1968
> OutputStreamWriter writer = new OutputStreamWriter(new
> ByteArrayOutputStream()); writer..getEncoding(): ASCII
>
> In our case, a Json library seems to be messing things up, as just on
> first glance I already found in its internals a string.getBytes()
> without the possibility to inform the encoding.
>
> I really wonder if there is any way to change this default in DataFlow.
>
> Cheers
>
> On 04.11.2019 09:58, Eddy G wrote:
> > Adding to what Jeff just pointed out previously I'm dealing with the
> > same issue writing Parquet files using the ParquetIO module in
> > Dataflow and same stuff happens, even forcing all String objects with
> > UTF-8. Maybe it is related to behind the scenes decoding/encoding
> > within the previously mentioned module which causes those chars to be
> > wrongly encoded in the output, just in case you are doing some
> > Parquet
> > processing or using any other module in the end which may have a
> > similar behavior.
>
>

Reply via email to