Re: CT from parquet to CSV seems to not properly encode to UTF8

2018-07-16 Thread Kunal Khatua
Hi Carlos It looks similar to an issue reported previously: https://lists.apache.org/thread.html/1f3d4c427690c06f1992bc5070f355689ccc5b1ed8cc3678ad8e9106@   Could you try setting the JVM's file encoding to UTF-8 and retry? If it does not work, please file a JIRA in https://issues.apache.org 

Re: CT from parquet to CSV seems to not properly encode to UTF8

2018-07-16 Thread Carlos Derich
It seems to be an issue only with CSV/TSV files. Tried writing the output as JSON and it handles the encoding properly. alter session set `store.format`='json' create table dfs.tmp.test3 as select `city` from dfs.parquets.`file` Returns: {"city": "Montréal"} additional info: parquet-tools

CT from parquet to CSV seems to not properly encode to UTF8

2018-07-16 Thread Carlos Derich
Hello guys, hope everyone is well. I am having an encoding issue when converting a table from parquet into csv files, I wonder if someone could shed some light on it ? One of my data sets has data in French with lots of accentuation, and it is persisted in HDFS as parquet. When I query the

Re: Best Practice to check Drillbit status(Cluster mode)

2018-07-16 Thread Abhishek Girish
I think logs may be the only way to figure it out, at the present. You could have a watch on your logs to be informed of such events. For notifications, I would say file an enhancement JIRA - if it gathers enough attention, perhaps someone would volunteer to work or comment on it. On Mon, Jul 16,

Re: Best Practice to check Drillbit status(Cluster mode)

2018-07-16 Thread Divya Gehlot
Hi , Thanks Abhishek ! I would like to have a notification of that orphan drillbit process when it gets disconnected from other running drillbits for some reason , definitely not because of the unclean shut down as those drill bits are running for months . I know I can check the logs and kill that