Query fails on corrupted parquet column

2016-02-02 Thread François Méthot
Hi, Using drill-embedded 1.4, I encountered this error while doing a query on folders containing thousands of parquet files: Error: SYSTEM ERROR: IOException: FAILED_TO_UNCOMPRESSED(5) Fragment 1:9 After re-running the same query with the log level set to DEBUG, I tracked the files that

Query Return Error because of a single file

2016-01-22 Thread François Méthot
Hi Drill Community, Using drill-embedded, I encountered this error while doing a query on folders containing thousands of parquet files: Error: SYSTEM ERROR: IOException: FAILED_TO_UNCOMPRESSED(5) Fragment 1:9 After re-running the same query with the log level set to DEBUG, I tracked

Rolling Window

2016-03-01 Thread François Méthot
Hi, We need to manage a rolling window of parquet data within drill. Our parquet files are partitioned by hour, Once hdfs reach a certain usage threshold, we want to delete the oldest partition folder. A simple approach would be to run a cron job that check the hdfs usage and delete the

Re: Rolling Window

2016-03-08 Thread François Méthot
no queries are in "running" state or schedule a delete outside business hours. On Tue, Mar 1, 2016 at 11:25 AM, François Méthot <fmetho...@gmail.com> wrote: > Hi, > > We need to manage a rolling window of parquet data within drill. > > Our parquet files are partiti

Re: Filtering data files in directories

2016-05-10 Thread François Méthot
like Ted mentioned, here is an example: SELECT * FROM dfs.data.`/*/processing1/*-mx.csv` On Tue, May 10, 2016 at 5:28 PM, Ted Dunning wrote: > Can you just use wild cards? > > > > On Tue, May 10, 2016 at 1:43 PM, Ludovic Claude < > ludovic.claud...@gmail.com> > wrote:

Re: Drill Performance

2016-07-14 Thread François Méthot
We have observed that if the number of drillbits is lower than the number of nodes in our cluster, some minor fragment takes longer to complete their query (We hypothesize that it is because they can't take advantage of data locality, fragment has to reach out for data on a different node). One

Re: Drill Performance

2016-09-08 Thread François Méthot
ht distribution, you can restrict the subset of the cluster that >> has the data you need to avoid locality variation when Drill only runs on >> a >> subset of nodes. >> >> >> >> On Thu, Jul 14, 2016 at 6:48 AM, François Méthot <fmetho...@gmail.com> >>

Re: Question regarding CTAS Command

2016-11-17 Thread François Méthot
Here is a workaround to insert large *batch *of data, each batch insert is done on their own partition. create table hdfs.`/mytable/1` as (select) Later on, If you want to insert new batch of data to the same table: create table hdfs.`/mytable/2` as (select) Then when you query select

Re: [Drill 1.9.0] : [CONNECTION ERROR] :- (user client) closed unexpectedly. Drillbit down?

2017-03-21 Thread François Méthot
Hi, We have been having client-foreman connection and ZkConnection issue few months ago. It went from annoying to a show stopper when we moved from a 12 nodes cluster to a 220 nodes cluster. Nodes specs - 8 cores total (2 x E5620) - 72 GB RAM Total - Other applications share the same hardware.

Re: Reading Parquet files with array or list columns

2017-06-30 Thread François Méthot
Hi, Have you tried: select column['list'][0]['element'] from ... should return "My First Value". or try: select flatten(column['list'])['element] from ... Hope it helps, in our data we have a column that looks like this: [{"NAME:":"Aname", "DATA":"thedata"},{"NAME:":"Aname2",

Re: change in data fields in json file

2018-01-17 Thread François Méthot
Hi, Try something like SELECT CASE WHEN field1_old is not null THEN field1_old ELSE field1_renamed as field1, field2, field3 FROM yourjsonfiles; On 15 January 2018 at 04:09, Divya Gehlot wrote: > Hi, > I am processing json data files in Apache Drill. >

Re: show files problem

2019-07-16 Thread François Méthot
[1] which was > included into 1.15.0 release. > I suspect you are trying to upgrade to some 1.15.0-SNAPSHOT version rather > than on 1.15.0 final version. > > [1] https://issues.apache.org/jira/browse/DRILL-6753 > > Kind regards, > Arina > > > On Jul 15, 2019, at 9:1

show files problem

2019-07-15 Thread François Méthot
Hi, We are aiming to upgrade our Drill cluster to version 1.15. On my test environment, we found that show files operation can take close to 1+ hour to list content of a directory with only a handful of files. Based on Jstack, it seems a drillbit is busy collecting permission information. Also