Guys,

I am trying to evaluate performance of a basic query – select count(*) from 
MY_TABLE

I have 800 million records partitioned in S3 in subfolders by YEAR/DAY/HOUR in 
14MB GZ JSON files

I have a 2+1 node cluster m3.xlarge instance type set up in EMR. It is taking 
over 54 minutes to return the total count. JSON documents are flat and contain 
only a few properties.

Is there a reason why the query would take so long to execute? If yes, what is 
the faster option?

Thank you.



**********************************************************

MLB.com: Where Baseball is Always On

Reply via email to