Foreman Parallelizer not working with compressed csv file?

Juergen Kneissl Thu, 23 Jul 2015 09:09:43 -0700

Hi everybody,

I installed and configured a small cluster with two machines (gnu/linux)
with the following setup:


zookeeper in version 3.4.6 , drill in version 1.1.0 and also using
hadoop (version 2.7.1) hdfs as dist. filesystem.

So, I am playing around a bit, but what I am still not understanding is
why my drill Foreman  bit1 (or whoever that is in the situation) is not
"really" parallelizing my request. (or do I expect something from the
architecture that is not intended?)


I select and aggregate on a 1,4 GB gzipped csv file, and I thought at
least part of the query  would be processed on the other drillbit.
(bit 2)

For instance, in the profiles I see that Major Fragment 01 was divided
into four Minor Fragments (of which two were forwarded to bit 2)

If I check on the drillbit.log file of the bit2 (in the above
configuration) a debug message tells me that the incoming record count
is 0?

The question is: What am I doing wrong in my configuration? Has it
something todo with using a csv file?

The query is also set in a way that it is clear the whole file has to be
read in memory. That does not concern me that much, now I just wanted to
check how the Foreman does the "Parallelization"

Best Regards & Thanks for any hint


Juergen

Foreman Parallelizer not working with compressed csv file?

Reply via email to