There are 800 million JSON documents in S3 partition in subfodlers by
YEAR/DAY/HOUR in GZ compressed format. The data set contains only a few
days worth of records. I was trying to run a query similar to.

SELECT t.contentId, count(*) `count`
FROM s3hbo.root.`/2015` t
GROUP BY t.contentId
order by `count` DESC
LIMIT 20


Going to try to add count(t.contentId) to see if there is an improvement.
Looking at JSON_SUB_SCAN indeed shows a significant portion of the job.

-Alex

**********************************************************

MLB.com: Where Baseball is Always On

Reply via email to