Re: Drill performance question

Charles Givre Mon, 30 Oct 2017 09:36:46 -0700

The data itself contains 6 or so columns:  date, user_id, city, state, lat, 
long.   I’m looking to aggregate by week, by day of week etc.  So the general 
pattern would look something like:


SELECT EXTRACT( day FROM `date` ) AS _`day`, 
COUNT( DISTINCT id ) as distinct_id, 
COUNT( id ) as total_id
FROM <data>
GROUP BY EXTRACT( day FROM `date` )
ORDER BY `day ASC

The view I created cast all the fields to the appropriate data types. 
— C

> On Oct 30, 2017, at 12:30, Saurabh Mahapatra <saurabhmahapatr...@gmail.com> 
> wrote:
> 
> Hi Charles,
> 
> Can you share some query patterns on this data? More specifically, the
> number of columns you retrieving out of the total, the filter on the time
> dimension itself (ranges and granularities)
> 
> How much is ad hoc and how much is not.
> 
> Best,
> Saurabh
> 
> On Mon, Oct 30, 2017 at 9:27 AM, Charles Givre <cgi...@gmail.com> wrote:
> 
>> Hello all,
>> I have a dataset consisting of about 16 GB of CSV files.  I am looking to
>> do some time series analysis of this data, and created a view but when I
>> started doing aggregate queries using components of the date, the
>> performance was disappointing.  Would it be better to do a CTAS and
>> partition by components of the date?  If so, would parquet be the best
>> format?
>> Would anyone have other suggestions of things I could do to improve
>> performance?
>> Thanks,
>> — C

Re: Drill performance question

Reply via email to