There are 800 million JSON documents in S3 partition in subfodlers by YEAR/DAY/HOUR in GZ compressed format. The data set contains only a few days worth of records. I was trying to run a query similar to.
SELECT t.contentId, count(*) `count` FROM s3hbo.root.`/2015` t GROUP BY t.contentId order by `count` DESC LIMIT 20 Going to try to add count(t.contentId) to see if there is an improvement. Looking at JSON_SUB_SCAN indeed shows a significant portion of the job. -Alex ********************************************************** MLB.com: Where Baseball is Always On
