Also, on a practical note, Parquet will likely crush CSV on performance. Columnar. Compressed. Binary. All that.
On Mon, Oct 30, 2017 at 9:30 AM, Saurabh Mahapatra < [email protected]> wrote: > Hi Charles, > > Can you share some query patterns on this data? More specifically, the > number of columns you retrieving out of the total, the filter on the time > dimension itself (ranges and granularities) > > How much is ad hoc and how much is not. > > Best, > Saurabh > > On Mon, Oct 30, 2017 at 9:27 AM, Charles Givre <[email protected]> wrote: > > > Hello all, > > I have a dataset consisting of about 16 GB of CSV files. I am looking to > > do some time series analysis of this data, and created a view but when I > > started doing aggregate queries using components of the date, the > > performance was disappointing. Would it be better to do a CTAS and > > partition by components of the date? If so, would parquet be the best > > format? > > Would anyone have other suggestions of things I could do to improve > > performance? > > Thanks, > > — C >
