Guys, I am trying to evaluate performance of a basic query – select count(*) from MY_TABLE
I have 800 million records partitioned in S3 in subfolders by YEAR/DAY/HOUR in 14MB GZ JSON files I have a 2+1 node cluster m3.xlarge instance type set up in EMR. It is taking over 54 minutes to return the total count. JSON documents are flat and contain only a few properties. Is there a reason why the query would take so long to execute? If yes, what is the faster option? Thank you. ********************************************************** MLB.com: Where Baseball is Always On
