Re: Querying parquet files

Yousef Lasi Tue, 07 Jul 2015 08:45:09 -0700

We are currently running(testing) with Veritas CFS (attached to EMC SAN 
storage) which is visible across 6 servers. We also have a single test MapR 
node, but that's a small sandbox. The production implementation will be with a 
10 node HDFS cluster


The data files are 20 GB to 40 GB in size.


July 7 2015 11:34 AM, "Ted Dunning" <ted.dunn...@gmail.com> wrote: 
> No.  A very simple model like that breaks down on many levels. The most 
> important level that
> reality intrudes in is the fact that your I/O probably can't really be 
> threaded so widely. 
> 
> What kind of storage are you using? How big is your data?  
> 
> Sent from my iPhone
> 
>> On Jul 7, 2015, at 6:38, "Yousef Lasi" <yousef.l...@gmail.com> wrote:
>> 
>> Am I correct in assuming that it will allocate a thread for each core 
>> available to all drill bits
>> and read x numbers of columns in parallel? so if we have 48 cores available 
>> and the file has 48
>> columns, then the time for the query for a single column should roughly 
>> equal the time for 48
>> columns? All other factors, such as data types being the same of course.

Re: Querying parquet files

Reply via email to