Hey guys, Adding this JVM flag to the drill-env.sh file made it to work.
export JAVA_TOOL_OPTIONS="-Dfile.encoding=UTF8" Thank you very much. On Tue, Jul 17, 2018 at 1:49 AM, Kunal Khatua <[email protected]> wrote: > Hi Carlos > > It looks similar to an issue reported previously: > https://lists.apache.org/thread.html/1f3d4c427690c06f1992bc5070f355 > 689ccc5b1ed8cc3678ad8e9106@<user.drill.apache.org> > > Could you try setting the JVM's file encoding to UTF-8 and retry? If it > does not work, please file a JIRA in https://issues.apache.org > > Thanks > Kunal > On 7/16/2018 1:25:45 PM, Carlos Derich <[email protected]> wrote: > It seems to be an issue only with CSV/TSV files. > > Tried writing the output as JSON and it handles the encoding properly. > > alter session set `store.format`='json' > create table dfs.tmp.test3 as select `city` from dfs.parquets.`file` > > Returns: > > {"city": "Montréal"} > > > additional info: > > parquet-tools schema: > > message root { > optional binary city (UTF8); > } > > > On Mon, Jul 16, 2018 at 2:49 PM, Carlos Derich > wrote: > > > Hello guys, hope everyone is well. > > > > I am having an encoding issue when converting a table from parquet into > > csv files, I wonder if someone could shed some light on it ? > > > > One of my data sets has data in French with lots of accentuation, and it > > is persisted in HDFS as parquet. > > > > > > When I query the parquet table with: *select `city` from > > dfs.parquets.`file` , *it properly return the data encoded. > > > > > > *city* > > > > *Montréal* > > > > > > Then I convert this table into a CSV file with the following query: > > > > *alter session set `store.format`='csv'* > > *create table dfs.csvs.`converted` as select * from dfs.parquets.`file`* > > > > > > Then when I run a select query on it, it returns data not properly > encoded: > > > > *select columns[0] from dfs.csvs.`converted`* > > > > Returns: > > > > *Montr?al* > > > > > > My storage plugin is pretty standard: > > > > "csv" : { > > "type" : "text", > > "extensions" : [ "csv" ], > > "delimiter" : ",", > > "skipFirstLine": true > > }, > > > > Should I explicitly add an charset option somewhere ? Couldn't find > > anything helpful on the docs. > > > > Tried adding *export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS > > -Dsaffron.default.charset=UTF-8"* to drill-env.sh file, but no luck. > > > > Have anyone ran into similar issues ? > > > > Thank you ! > > >
