It appears that we will be implementing Drill before our Hadoop infrastructure is ready for production. A question that's come up related to deploying Drill on clustered Linux hosts (i.e. hosts with a shared file system but no HDFS) is whether Drill parallelization can take advantage of multiple drill bits in this scenario.
Should we expect Drill to auto-split large CSV files and read/sort them in parallel? That does not appear to happen in our testing. We've had to manually partition large files into sets of files stored in a shared folder. Is there any value to having multiple drill bits with access to the same shared files in CFS/GFS? Thanks
