Re: Apache Drill and S3 performance

Jacques Nadeau Sun, 07 Jun 2015 17:13:05 -0700

Can you post a query profile json?  It might help us to determine where the
time is being spent.


How many files are being queries?

On Sun, Jun 7, 2015 at 3:47 PM, Satish Cattamanchi <[email protected]>
wrote:

> We are evaluating Apache Drill performance, and we have setup  Apache
> Drill on Amazon.
>
> All EC2 machines are r3.2xLarge instance type.
>
> Model   vCPU    Mem (GiB)       SSD Storage (GB)
>
>
>
>
> r3.2xlarge      8       61      1 x 160
>
>
>
>
>
>
> Zookeeper - 1 EC2 machine
> Drillbits - 25 EC2 machines.
> Data on - Amazon  S3
> Data Format - Flat File with PSV ( Pipe Separated) and GZIP'ed.
> Storage Hierarchy  - /logs/requests/y=2015/m=01/d=01/hh=-01/
> Daily Data Size - 2TB approx.
> Daily Rows - 3.5B
>
> Using Apache Drill with Default Configuration.
>
> I was successfully able to configure Apache Drill and connect to S3 and
> query the data from S3.
>
> But when I do count(*) on the day folder, its taking around 45-50min with
> the above setup. Any other queries with WHERE condition also takes similar
> time. I was wondering whether the slowness is due to copying data back n
> forth from S3.
>
> Could anyone give some suggestions on setup/configuration to achieve
> better performance with Apache Drill?
>
> Thanks,
> Satish
>
>

Re: Apache Drill and S3 performance

Reply via email to