Hi everybody, I installed and configured a small cluster with two machines (gnu/linux) with the following setup:
zookeeper in version 3.4.6 , drill in version 1.1.0 and also using hadoop (version 2.7.1) hdfs as dist. filesystem. So, I am playing around a bit, but what I am still not understanding is why my drill Foreman bit1 (or whoever that is in the situation) is not "really" parallelizing my request. (or do I expect something from the architecture that is not intended?) I select and aggregate on a 1,4 GB gzipped csv file, and I thought at least part of the query would be processed on the other drillbit. (bit 2) For instance, in the profiles I see that Major Fragment 01 was divided into four Minor Fragments (of which two were forwarded to bit 2) If I check on the drillbit.log file of the bit2 (in the above configuration) a debug message tells me that the incoming record count is 0? The question is: What am I doing wrong in my configuration? Has it something todo with using a csv file? The query is also set in a way that it is clear the whole file has to be read in memory. That does not concern me that much, now I just wanted to check how the Foreman does the "Parallelization" Best Regards & Thanks for any hint Juergen
