Re: Foreman Parallelizer not working with compressed csv file?

Abdel Hakim Deneche Thu, 23 Jul 2015 09:12:21 -0700

Hi Juergen,

can you share the query you tried to run ?


Thanks

On Thu, Jul 23, 2015 at 9:10 AM, Juergen Kneissl <[email protected]> wrote:

> Hi everybody,
>
> I installed and configured a small cluster with two machines (gnu/linux)
> with the following setup:
>
> zookeeper in version 3.4.6 , drill in version 1.1.0 and also using
> hadoop (version 2.7.1) hdfs as dist. filesystem.
>
> So, I am playing around a bit, but what I am still not understanding is
> why my drill Foreman  bit1 (or whoever that is in the situation) is not
> "really" parallelizing my request. (or do I expect something from the
> architecture that is not intended?)
>
>
> I select and aggregate on a 1,4 GB gzipped csv file, and I thought at
> least part of the query  would be processed on the other drillbit.
> (bit 2)
>
> For instance, in the profiles I see that Major Fragment 01 was divided
> into four Minor Fragments (of which two were forwarded to bit 2)
>
> If I check on the drillbit.log file of the bit2 (in the above
> configuration) a debug message tells me that the incoming record count
> is 0?
>
> The question is: What am I doing wrong in my configuration? Has it
> something todo with using a csv file?
>
> The query is also set in a way that it is clear the whole file has to be
> read in memory. That does not concern me that much, now I just wanted to
> check how the Foreman does the "Parallelization"
>
> Best Regards & Thanks for any hint
>
>
> Juergen
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Re: Foreman Parallelizer not working with compressed csv file?

Reply via email to