Hey guys,

Adding this JVM flag to the drill-env.sh file made it to work.

export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8"

Thank you very much.


On Tue, Jul 17, 2018 at 1:49 AM, Kunal Khatua <[email protected]> wrote:

> Hi Carlos
>
> It looks similar to an issue reported previously:
> https://lists.apache.org/thread.html/1f3d4c427690c06f1992bc5070f355
> 689ccc5b1ed8cc3678ad8e9106@<user.drill.apache.org>
>
> Could you try setting the JVM's file encoding to UTF-8 and retry? If it
> does not work, please file a JIRA in https://issues.apache.org
>
> Thanks
> Kunal
> On 7/16/2018 1:25:45 PM, Carlos Derich <[email protected]> wrote:
> It seems to be an issue only with CSV/TSV files.
>
> Tried writing the output as JSON and it handles the encoding properly.
>
> alter session set `store.format`='json'
> create table dfs.tmp.test3 as select `city` from dfs.parquets.`file`
>
> Returns:
>
> {"city": "Montréal"}
>
>
> additional info:
>
> parquet-tools schema:
>
> message root {
> optional binary city (UTF8);
> }
>
>
> On Mon, Jul 16, 2018 at 2:49 PM, Carlos Derich
> wrote:
>
> > Hello guys, hope everyone is well.
> >
> > I am having an encoding issue when converting a table from parquet into
> > csv files, I wonder if someone could shed some light on it ?
> >
> > One of my data sets has data in French with lots of accentuation, and it
> > is persisted in HDFS as parquet.
> >
> >
> > When I query the parquet table with: *select `city` from
> > dfs.parquets.`file` , *it properly return the data encoded.
> >
> >
> > *city*
> >
> > *Montréal*
> >
> >
> > Then I convert this table into a CSV file with the following query:
> >
> > *alter session set `store.format`='csv'*
> > *create table dfs.csvs.`converted` as select * from dfs.parquets.`file`*
> >
> >
> > Then when I run a select query on it, it returns data not properly
> encoded:
> >
> > *select columns[0] from dfs.csvs.`converted`*
> >
> > Returns:
> >
> > *Montr?al*
> >
> >
> > My storage plugin is pretty standard:
> >
> > "csv" : {
> > "type" : "text",
> > "extensions" : [ "csv" ],
> > "delimiter" : ",",
> > "skipFirstLine": true
> > },
> >
> > Should I explicitly add an charset option somewhere ? Couldn't find
> > anything helpful on the docs.
> >
> > Tried adding *export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS
> > -Dsaffron.default.charset=UTF-8"* to drill-env.sh file, but no luck.
> >
> > Have anyone ran into similar issues ?
> >
> > Thank you !
> >
>

Reply via email to