Hi Jacques,
All these technologies are pretty new to me, but I can give it a shot :)
Is there any literature that can help me understand how things are set
up?
Cheers,
On Fri, Mar 11, 2016 at 01:42:54PM -0800, Jacques Nadeau wrote:
I've been thinking a lot of this. Definitely think there
I've been thinking a lot of this. Definitely think there should be a clean
fix to this but haven't had the cycles to suggest something. You up for
looking at the code and trying to suggest something?
thanks!
--
Jacques Nadeau
CTO and Co-Founder, Dremio
On Thu, Mar 10, 2016 at 8:06 AM, Oscar
I've been checking the logs, and I think that the problem is that it's
walking through the "directories" in S3 recursively, doing lots of small
HTTP requests.
My files are organized like this which amplifies the issue:
/category/random-hash/year/month/day/hour/data-chunk-000.json.gz
The
I'm querying 20G of gzipped JSONs split in ~5600 small files with sizes
ranging from 1M to 30Mb. Drill is running in aws in 4 m4.xlarge nodes
and it's taking around 50 minutes before the query starts executing.
Any idea what could be causing this delay? What's the best way to debug
this?