Re: Error while applying interval on a postgresql query

2017-06-09 Thread Rahul Raj
It works fine on csv/parquet files. Issue happens on rdbms tables. On Jun 10, 2017 04:48, "Boaz Ben-Zvi" wrote: > This works fine on a json table: > > 0: jdbc:drill:zk=local> select * from dfs.`/data/test2.json` where > DATE_ADD(CAST(START_DATE as DATE),interval '1' second) < CAST(CURRENT_DATE

Re: Error while applying interval on a postgresql query

2017-06-09 Thread Boaz Ben-Zvi
This works fine on a json table: 0: jdbc:drill:zk=local> select * from dfs.`/data/test2.json` where DATE_ADD(CAST(START_DATE as DATE),interval '1' second) < CAST(CURRENT_DATE as DATE); +-+-+ | id | start_date | +-+-+ | 1 | 1997-10-27 | | 2 | 1997-10-27 | |

Re: Column alias are ignored when Storage Plugin is enabled

2017-06-09 Thread Jinfeng Ni
I feel DRILL-5577 is more likely to be related to DRILL-5538. The cause is an optimizer rule ProjectRemoveRule, which is added to query planner by JDBC storage plugin if enabled. I believe Arina is looking for a fix. On Fri, Jun 9, 2017 at 2:22 AM, Rahul Raj wrote: > Created DRILL-5577

Re: Increasing store.parquet.block-size

2017-06-09 Thread Kunal Khatua
The ideal size depends on what engine is consuming the parquet files (Drill, i'm guessing) and the storage layer. For HDFS, which is usually 128-256GB, we recommend to bump it to about 512GB (with the underlying HDFS blocksize to match that). You'll probably need to experiment a little wit

Re: Increasing store.parquet.block-size

2017-06-09 Thread Shuporno Choudhury
Thanks for the information Kunal. After the conversion, the file size scales down to half if I use gzip compression. For a 10 GB gzipped csv source file, it becomes 5GB (2+2+1) parquet file (using gzip compression). So, if I have to make multiple parquet files, what block size would be optimal, if

Re: Increasing store.parquet.block-size

2017-06-09 Thread Kunal Khatua
If you're storing this in S3... you might want to selectively read the files as well. I'm only speculating, but if you want to download the data, downloading as a queue of files might be more reliable than one massive file. Similarly, within AWS, it *might* be faster to have an EC2 instance a

Re: Increasing store.parquet.block-size

2017-06-09 Thread Shuporno Choudhury
Thanks Kunal for your insight. I am actually converting some .csv files and storing them in parquet format in s3, not in HDFS. The size of the individual .csv source files can be quite huge (around 10GB). So, is there a way to overcome this and create one parquet file or do I have to go ahead with

Re: Increasing store.parquet.block-size

2017-06-09 Thread Kunal Khatua
Shuporno There are some interesting problems when using Parquet files > 2GB on HDFS. If I'm not mistaken, the HDFS APIs that allow you to read offsets (oddly enough) returns an int value. Large Parquet blocksize also means you'll end up having the file span across multiple HDFS blocks, and th

Re: Increasing store.parquet.block-size

2017-06-09 Thread Vitalii Diravka
Khurram, DRILL-2478 is a good place holder for the LongValidator issue, it really works wrong. But other issue connected to impossibility to use long values for parquet block-size. This issue can be independent task or a sub-task of updating Drill project to a latest parquet library. Kind regard

Re: Increasing store.parquet.block-size

2017-06-09 Thread Khurram Faraaz
1. DRILL-2478 is Open for this issue. 2. I have added more details into the comments. Thanks, Khurram From: Shuporno Choudhury Sent: Friday, June 9, 2017 12:48:41 PM To: user@drill.apache.org Subject: Incre

Re: Column alias are ignored when Storage Plugin is enabled

2017-06-09 Thread Rahul Raj
Created DRILL-5577 On Thu, Jun 8, 2017 at 7:34 PM, Kunal Khatua wrote: > It could be related to these as well: > > https://issues.apache.org/jira/browse/DRILL-5537 > > https://issues.apache.org/jira/browse/DRILL-5538 > > > Please go ahead and fi

Increasing store.parquet.block-size

2017-06-09 Thread Shuporno Choudhury
The max value that can be assigned to *store.parquet.block-size *is *2147483647*, as the value kind of this configuration parameter is LONG. This basically translates to 2GB of block size. How do I increase it to 3/4/5 GB ? Trying to set this parameter to a higher value using the following command