The only combination that I can think of is to use this hack[1] combined with a JvmInitialier[2].
1: https://stackoverflow.com/a/14987992/4368200 2: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java On Mon, Nov 4, 2019 at 1:40 AM Leonardo Campos | GameDuell < [email protected]> wrote: > Thanks, Eddie. > > Just to add to the discussion, I logged the following information: > Charset.defaultCharset(): US-ASCII > System.getProperty("file.encoding"): ANSI_X3.4-1968 > OutputStreamWriter writer = new OutputStreamWriter(new > ByteArrayOutputStream()); writer..getEncoding(): ASCII > > In our case, a Json library seems to be messing things up, as just on > first glance I already found in its internals a string.getBytes() > without the possibility to inform the encoding. > > I really wonder if there is any way to change this default in DataFlow. > > Cheers > > On 04.11.2019 09:58, Eddy G wrote: > > Adding to what Jeff just pointed out previously I'm dealing with the > > same issue writing Parquet files using the ParquetIO module in > > Dataflow and same stuff happens, even forcing all String objects with > > UTF-8. Maybe it is related to behind the scenes decoding/encoding > > within the previously mentioned module which causes those chars to be > > wrongly encoded in the output, just in case you are doing some > > Parquet > > processing or using any other module in the end which may have a > > similar behavior. > >
