Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish
Glad to know that it worked! As you are using Drill on Windows, the new line delimiter in text files can be different from that on Linux / Mac. We could see \r\n as the lineDelimiter (carriage return & new line) and hence when we set the same in the format plugin, the issue gets resolved. This

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish
Filed DRILL-5684 to track the doc issue. On Mon, Jul 24, 2017 at 8:33 AM, Abhishek Girish wrote: > Glad to know that it worked! > > As you are using Drill on Windows, the new line delimiter in text files > can be different

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Abhishek Girish
Can you update your csv format plugin as shown below and retry your query? "csv": { "type": "text", "extensions": [ "csv" ], "lineDelimiter": "\r\n", "extractHeader": true, "delimiter": "," } On Sun, Jul 23, 2017 at 10:37 PM, Divya Gehlot

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Divya Gehlot
Thank you so much it worked Can you please provide me the pointer to the documentation where updation for different format type are mentioned . As I am facing another issue with date type as the data which I receive in csv format has the format of 15/1/2016 when I try to cast or convert to_date

Re: CTAS and save as parquet last column values are shown as null

2017-07-24 Thread Divya Gehlot
Any best practices guide if any body could share would be great On 24 July 2017 at 14:52, Divya Gehlot wrote: > Thank you so much it worked > > Can you please provide me the pointer to the documentation where updation > for different format type are mentioned . > > As

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Cliff Resnick
Jinfeng, I'm wondering if there's a way to push schema info to Drill even if there is no result. KuduScanner always has schema, and RecordReader always has scanner. But I can't seem to find the disconnect. Any idea if this is possible even if it's Kudu-specific hack? -Cliff On Mon, Jul 24, 2017

Question about Drill aggregate queries and schema change

2017-07-24 Thread Cliff Resnick
I spent some time over the weekend altering Drill's storage-kudu to use Kudu's predicate pushdown api. Everything worked great as long as I performed flat filtered selects (eg. SELECT .. FROM .. WHERE ..") but whenever I tested aggregate queries, they would succeed sometimes, then fail other times

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Gautam Parai
Hi Cliff, Thanks so much for trying it out. The error means that this particular operator assumed the datatype as `nullable int` since the first batch it saw did not have data for some column (`a` in you case?) However, in one of the subsequent batches it sees the datatype is a `big int`.

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Jinfeng Ni
If you see such errors only when you enable predicate pushdown, it might be related to a known issue; schema change failure caused by empty batch [1]. This happened when predicate prunes everything, and kudu reader did not return a RowResult with a schema. In such case, Drill would interprete the

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Cliff Resnick
Jinfeng, Thanks, that confirms my thoughts as well. If I query using full range bounds and all hash keys, then Kudu prunes to the exact tablets and there is no error. I'll watch that jira expectantly because Kudu + Drill would be an awseome combo. But without the pruning it's useless to us.

[HANGOUT] Topics for 7/25/17

2017-07-24 Thread Arina Yelchiyeva
Hi all, We'll have the hangout tomorrow at the usual time [1]. Any topics to be discussed? [1] https://drill.apache.org/community-resources/ Kind regards Arina

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Jinfeng Ni
Based on my limited understanding of Drill's KuduRecordReader, the problem seems to be in the next() method [1]. When RowResult's iterator return false for hasNext(), in the case filter prune everything, the code will skip the call of addRowResult(). That means no columns/data will be added to

Re: Question about Drill aggregate queries and schema change

2017-07-24 Thread Cliff Resnick
That makes sense, so I guess the solution is to return a null row instead? If so is there a way to fag it to be ignored downstream (to avoid any unintended consequences)? Thanks for the help! On Mon, Jul 24, 2017 at 7:06 PM, Jinfeng Ni wrote: > Based on my limited

Re: [HANGOUT] Topics for 7/25/17

2017-07-24 Thread Padma Penumarthy
I have a topic to discuss. Lot of folks on the user mailing list raised the issue of not being able to access all S3 regions using Drill. We need hadoop version 2.8 or higher to be able to connect to regions which support only Version 4 signature. I tried with 2.8.1, which just got released and it