Can you post a query profile json? It might help us to determine where the time is being spent.
How many files are being queries? On Sun, Jun 7, 2015 at 3:47 PM, Satish Cattamanchi <[email protected]> wrote: > We are evaluating Apache Drill performance, and we have setup Apache > Drill on Amazon. > > All EC2 machines are r3.2xLarge instance type. > > Model vCPU Mem (GiB) SSD Storage (GB) > > > > > r3.2xlarge 8 61 1 x 160 > > > > > > > Zookeeper - 1 EC2 machine > Drillbits - 25 EC2 machines. > Data on - Amazon S3 > Data Format - Flat File with PSV ( Pipe Separated) and GZIP'ed. > Storage Hierarchy - /logs/requests/y=2015/m=01/d=01/hh=-01/ > Daily Data Size - 2TB approx. > Daily Rows - 3.5B > > Using Apache Drill with Default Configuration. > > I was successfully able to configure Apache Drill and connect to S3 and > query the data from S3. > > But when I do count(*) on the day folder, its taking around 45-50min with > the above setup. Any other queries with WHERE condition also takes similar > time. I was wondering whether the slowness is due to copying data back n > forth from S3. > > Could anyone give some suggestions on setup/configuration to achieve > better performance with Apache Drill? > > Thanks, > Satish > >
