The data itself contains 6 or so columns: date, user_id, city, state, lat, long. Iām looking to aggregate by week, by day of week etc. So the general pattern would look something like:
SELECT EXTRACT( day FROM `date` ) AS _`day`, COUNT( DISTINCT id ) as distinct_id, COUNT( id ) as total_id FROM <data> GROUP BY EXTRACT( day FROM `date` ) ORDER BY `day ASC The view I created cast all the fields to the appropriate data types. ā C > On Oct 30, 2017, at 12:30, Saurabh Mahapatra <saurabhmahapatr...@gmail.com> > wrote: > > Hi Charles, > > Can you share some query patterns on this data? More specifically, the > number of columns you retrieving out of the total, the filter on the time > dimension itself (ranges and granularities) > > How much is ad hoc and how much is not. > > Best, > Saurabh > > On Mon, Oct 30, 2017 at 9:27 AM, Charles Givre <cgi...@gmail.com> wrote: > >> Hello all, >> I have a dataset consisting of about 16 GB of CSV files. I am looking to >> do some time series analysis of this data, and created a view but when I >> started doing aggregate queries using components of the date, the >> performance was disappointing. Would it be better to do a CTAS and >> partition by components of the date? If so, would parquet be the best >> format? >> Would anyone have other suggestions of things I could do to improve >> performance? >> Thanks, >> ā C