Re: Drill performance question

2017-10-31 Thread Charles Givre
t;> >> Last but not the least, if you are doing a query of the form.. >> select X,Y,Z where time between and >> you will benefit immensely from the data being sorted with that time field. >> >> Hope that helps. >> >> ~ Kunal >> >> -Origi

Re: Drill performance question

2017-10-30 Thread Andries Engelbrecht
me between and > you will benefit immensely from the data being sorted with that time field. > > Hope that helps. > > ~ Kunal > > -Original Message- > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > Sent: Monday, October

Re: Drill performance question

2017-10-30 Thread Saurabh Mahapatra
o:ted.dunn...@gmail.com] > Sent: Monday, October 30, 2017 9:34 AM > To: user <user@drill.apache.org> > Subject: Re: Drill performance question > > Also, on a practical note, Parquet will likely crush CSV on performance. > Columnar. Compressed. Binary. All that. > >

RE: Drill performance question

2017-10-30 Thread Kunal Khatua
ct: Re: Drill performance question Also, on a practical note, Parquet will likely crush CSV on performance. Columnar. Compressed. Binary. All that. On Mon, Oct 30, 2017 at 9:30 AM, Saurabh Mahapatra < saurabhmahapatr...@gmail.com> wrote: > Hi Charles, > > Can you share some query

Re: Drill performance question

2017-10-30 Thread Charles Givre
The data itself contains 6 or so columns: date, user_id, city, state, lat, long. I’m looking to aggregate by week, by day of week etc. So the general pattern would look something like: SELECT EXTRACT( day FROM `date` ) AS _`day`, COUNT( DISTINCT id ) as distinct_id, COUNT( id ) as

Re: Drill performance question

2017-10-30 Thread Ted Dunning
Also, on a practical note, Parquet will likely crush CSV on performance. Columnar. Compressed. Binary. All that. On Mon, Oct 30, 2017 at 9:30 AM, Saurabh Mahapatra < saurabhmahapatr...@gmail.com> wrote: > Hi Charles, > > Can you share some query patterns on this data? More specifically, the >

Re: Drill performance question

2017-10-30 Thread Saurabh Mahapatra
Hi Charles, Can you share some query patterns on this data? More specifically, the number of columns you retrieving out of the total, the filter on the time dimension itself (ranges and granularities) How much is ad hoc and how much is not. Best, Saurabh On Mon, Oct 30, 2017 at 9:27 AM,