Also, on a practical note, Parquet will likely crush CSV on performance.
Columnar. Compressed. Binary.  All that.



On Mon, Oct 30, 2017 at 9:30 AM, Saurabh Mahapatra <
[email protected]> wrote:

> Hi Charles,
>
> Can you share some query patterns on this data? More specifically, the
> number of columns you retrieving out of the total, the filter on the time
> dimension itself (ranges and granularities)
>
> How much is ad hoc and how much is not.
>
> Best,
> Saurabh
>
> On Mon, Oct 30, 2017 at 9:27 AM, Charles Givre <[email protected]> wrote:
>
> > Hello all,
> > I have a dataset consisting of about 16 GB of CSV files.  I am looking to
> > do some time series analysis of this data, and created a view but when I
> > started doing aggregate queries using components of the date, the
> > performance was disappointing.  Would it be better to do a CTAS and
> > partition by components of the date?  If so, would parquet be the best
> > format?
> > Would anyone have other suggestions of things I could do to improve
> > performance?
> > Thanks,
> > — C
>

Reply via email to