How to verify predicate pushdown

2017-10-26 Thread PROJJWAL SAHA
Hello, One question, How to verify whether predicate pushdown is happening ? I have one parquet file generated using CTAS command. I have executed REFRESH METADATA. I am firing a simple query with a WHERE clause. In the physical plan for the scan operation, i see rowcount as total number of

Re: Benchmark numbers using Drill

2017-10-20 Thread PROJJWAL SAHA
tuning. Which also means that you > have to revisit the kinds of analytics you would like your end users to > have. Which again raises the question-what kinds of analytics truly > generate value for the BI user? > > Best, > Saurabh > > On Wed, Oct 18, 2017 at 10:26 PM, PROJJWAL

Benchmark numbers using Drill

2017-10-18 Thread PROJJWAL SAHA
Hi, Is there any public performance benchmark that users have achieved using Drill in production scenarios ? It would be useful if someone can pass me any links for customer user stories. Regards

Re: Exception while reading parquet data

2017-10-16 Thread PROJJWAL SAHA
are the link . > > Did Parth's suggestion of > store.parquet.reader.pagereader.bufferedread=false > resolve the issue? > > Also share the details of the hardware setup... #nodes, Hadoop version, > etc. > > > -----Original Message- > From: PROJJWAL SAHA [mailto:pr

Re: Exception while reading parquet data

2017-10-15 Thread PROJJWAL SAHA
minimal data file that triggers this? > > You can also try turning off the buffering reader. >store.parquet.reader.pagereader.bufferedread=false > > With async reader on and buffering off, you might not see any degradation > in performance in most cases. > > > > On T

Re: Exception while reading parquet data

2017-10-12 Thread PROJJWAL SAHA
ava.nio.Buffer.checkBounds(Buffer.java:567) ~[na:1.8.0_121] at java.nio.ByteBuffer.put(ByteBuffer.java:827) ~[na:1.8.0_121] at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:379) ~[na:1.8.0_121] at org.apache.parquet.hadoop.util.CompatibilityUtil.getBuf(CompatibilityUtil.

Re: Exception while reading parquet data

2017-10-12 Thread PROJJWAL SAHA
t; > Can you try disabling async parquet reader to see if problem gets resolved. > > > alter session set `store.parquet.reader.pagereader.async`=false; > > Thanks, > > Arjun > > > > From: PROJJWAL SAHA <proj.s...@gmail.com> &g

Exception while reading parquet data

2017-10-11 Thread PROJJWAL SAHA
I get below exception when querying parquet data on Oracle Storage Cloud service. Any pointers on what does this point to ? Regards, Projjwal ERROR o.a.d.e.u.f.BufferedDirectBufInputStream - Error reading from stream part-6-25a9ae4b-fd9e-4770-b17e-9a29b270a4c2.parquet. Error was : null

Re: Exception when querying parquet data

2017-10-11 Thread PROJJWAL SAHA
y you used in the query i.e. `data25Goct6/websales` ? > > Thanks > Padma > > > On Oct 9, 2017, at 5:50 AM, PROJJWAL SAHA <proj.s...@gmail.com<mailto:pr > oj.s...@gmail.com>> wrote: > > Hello all, > > I am getting the below exception when querying parqu

Exception when querying parquet data

2017-10-09 Thread PROJJWAL SAHA
Hello all, I am getting the below exception when querying parquet data stored in storage cloud service.What does this exception point to ? The query on the same parquet files works when they are stored in alluxio.which means the data is fine. I am using drill 11.1 Any help is appreciated !

Re: Enable debugging for 3rd party storage plugin with eclipse

2017-03-23 Thread PROJJWAL SAHA
and put it in the classpath of drill to enable me to debug the code at runtime. Please help me here. Regards, Projjwal On Sun, Mar 19, 2017 at 10:43 PM, PROJJWAL SAHA <proj.s...@gmail.com> wrote: > Hi all, > > I am trying to debug a 3rd party storage plugin and I need to enable

Enable debugging for 3rd party storage plugin with eclipse

2017-03-19 Thread PROJJWAL SAHA
Hi all, I am trying to debug a 3rd party storage plugin and I need to enable debug with my eclipse IDE. Can someone pls guide me on the steps to enable debugging for eclipse - any documentation / link would also help. Also are the steps same if I would want to debug drill codebase ? Regards,

Re: Display of query result using command line

2017-03-15 Thread PROJJWAL SAHA
far...@mapr.com> wrote: > Three million rows is too many rows, for sqlline to print. > > Try doing a COUNT(*) and see if that query returns the correct count on > that table. > > > Thanks, > > Khurram > > ____ > From: PROJJWAL SAHA <pro

Display of query result using command line

2017-03-15 Thread PROJJWAL SAHA
All, I am using drillconf from command line to display a query result like select * from xxx having 3 million rows. The screen display scrolls fast to display the result, however, it stops after some time with this exception - java.lang.NegativeArraySizeException at

Query on .gz.parquet files

2017-03-09 Thread PROJJWAL SAHA
All, one question i am querying on .gz.parquet files. select * from xxx returns data like +-+ | current | +-+ |

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-07 Thread PROJJWAL SAHA
t thing I would try is to make your cluster a single node > cluster first and then run the same explain plan query separately on each > individual file. > > > > On Mar 7, 2017 5:09 AM, "PROJJWAL SAHA" <proj.s...@gmail.com> wrote: > > > Hi Rahul, > >

Fwd: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-06 Thread PROJJWAL SAHA
dfs storage plugin. Query planning time is approx 30 secs Query execution time is apprx 1.5 secs Regards, Projjwal -- Forwarded message -- From: PROJJWAL SAHA <proj.s...@gmail.com> Date: Fri, Mar 3, 2017 at 5:06 PM Subject: Minimise query plan time for dfs plugin for loca

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-05 Thread PROJJWAL SAHA
tributed cluster is having some effect on the planning... > > On Fri, Mar 3, 2017 at 6:08 AM, PROJJWAL SAHA <proj.s...@gmail.com> wrote: > > > I did not change the default values used by drill. > > Are you talking of changing planner.memory_limit > > and planner.memory.m

Re: Minimise query plan time for dfs plugin for local file system on tsv file

2017-03-03 Thread PROJJWAL SAHA
wrote: > how much memory have you set for planner ? > > On Fri, Mar 3, 2017 at 5:06 PM, PROJJWAL SAHA <proj.s...@gmail.com> wrote: > > > Hello all, > > > > I am quering select * from dfs.xxx where yyy (filter condition) > > > > I am using dfs storage pl

Distribution of workload across nodes in a cluster

2017-02-22 Thread PROJJWAL SAHA
Hello, I am doing select * query on a csv file of 1 GB with a 5 node drill cluster. The csv file is stored in another storage cluster within the enterprise. In the query profile, I see one major fragment and within the major fragment, I see only 1 minor fragment. The hostname for the minor

Re: Query on performance using Drill and Amazon s3.

2017-02-21 Thread PROJJWAL SAHA
gt; to move them into a single region. > > In either case, from AWS console you can figure out how much network > throughput you are getting if that is the bottleneck > Also drill machines would need CPU so along with 32GB memory if you have 8 > cores that would be desirable > >

Re: Query on performance using Drill and Amazon s3.

2017-02-21 Thread PROJJWAL SAHA
l server is > > On Mon, Feb 20, 2017 at 5:37 PM, PROJJWAL SAHA <proj.s...@gmail.com> > wrote: > > > Hello all, > > > > I am using 1GB data in the form of .tsv file, stored in Amazon S3 using > > Drill 1.8. I am using default configurations of Drill using S3

Query on performance using Drill and Amazon s3.

2017-02-20 Thread PROJJWAL SAHA
Hello all, I am using 1GB data in the form of .tsv file, stored in Amazon S3 using Drill 1.8. I am using default configurations of Drill using S3 storage plugin coming out of the box. The drill bits are configured on a 5 node cluster with 32GB RAM and 4VCPU. I see that select * from xxx; query