It appears that we will be implementing Drill before our Hadoop infrastructure 
is ready for production. A question that's come up related to deploying Drill 
on clustered
 Linux hosts (i.e. hosts with a shared file system but no HDFS) is whether 
Drill parallelization can take advantage of multiple drill bits in this 
scenario.

 Should we expect Drill to auto-split large CSV files and read/sort them in 
parallel? That does not appear to happen in our testing. We've had to manually 
partition large files into sets of files stored in a shared folder.

 Is there any value to having multiple drill bits with access to the same 
shared files in CFS/GFS?

 Thanks

Reply via email to