Thanks alot Stefan :)

On Sat, Jul 25, 2015 at 2:58 PM, Stefán Baxter <[email protected]>
wrote:

> Hi,
>
> I'm pretty new around here but let me attempt to answer you.
>
>    - Parquet will always be (a lot) faster than CSV, especially if your
>    querying for only a part of the columns in the CSV
>    - Parquet is has various compression techniques and is more "scan
>    friendly" (optimized for scanning compressed data)
>
>    - The optimal filesize is linked to the fs segment sizes (I'm not sure
>    how that effects S3) and block sizes
>    - hava a look at this:
>    http://ingest.tips/2015/01/31/parquet-row-group-size/
>
>    - Read up on partitioning of Parquet file that is supported by Drill and
>    can improve your performance quite a bit
>    - partitioning can help you with efficiently filter data and prevents
>    scanning of data not relevant to your query
>
>    - Spend a little bit of time to plan how your will map your CSV to
>    Parquet to make sure columns are imported as the appropriate data type
>    - this matters in compression and efficiency (storing numbers as string,
>    for example, will prevent Parquet for doing some optimization magick
>    - See this:
>    http://www.slideshare.net/julienledem/th-210pledem?next_slideshow=2 (or
>    some of the other presentations on Parquet)
>
>    - Optimize your drillbits (Drill machines) so they are sharing the
>    workload
>
>    - Get to know #3 best practices
>    - https://www.youtube.com/watch?v=_FHRzq7eHQc
>    - https://aws.amazon.com/articles/1904
>
> Hope this helps,
>  -Stefan
>
> On Sat, Jul 25, 2015 at 9:08 AM, Hafiz Mujadid <[email protected]>
> wrote:
>
> > Hi!
> >
> > I have terabytes of data on S3 and I want to query this data using
> drill. I
> > want to know at which format of data drill gives best performance.
> whether
> > CSV format will be best or parquet format? Also what should be file size?
> > whether small files will be more appropriate for drill or large files?
> >
> >
> > Thanks
> >
>



-- 
Regards: HAFIZ MUJADID

Reply via email to