We are currently running(testing) with Veritas CFS (attached to EMC SAN storage) which is visible across 6 servers. We also have a single test MapR node, but that's a small sandbox. The production implementation will be with a 10 node HDFS cluster
The data files are 20 GB to 40 GB in size. July 7 2015 11:34 AM, "Ted Dunning" <ted.dunn...@gmail.com> wrote: > No. A very simple model like that breaks down on many levels. The most > important level that > reality intrudes in is the fact that your I/O probably can't really be > threaded so widely. > > What kind of storage are you using? How big is your data? > > Sent from my iPhone > >> On Jul 7, 2015, at 6:38, "Yousef Lasi" <yousef.l...@gmail.com> wrote: >> >> Am I correct in assuming that it will allocate a thread for each core >> available to all drill bits >> and read x numbers of columns in parallel? so if we have 48 cores available >> and the file has 48 >> columns, then the time for the query for a single column should roughly >> equal the time for 48 >> columns? All other factors, such as data types being the same of course.