I have some general questions that I've been unable to google. I'm particularly interested in co-locating drillbits with nodes in a custom store of ours, so I've been poking around in source and searching about for examples of this.
1. My understanding is that Drill understands HDFS and if you co-locate a drillbit with a data node, then Drill will automatically distribute queries to the drillbits on the nodes that contain the relevant files. 1a. Where does drill run a join then? On the node that initiated the query, or on one of the nodes that contain the data? 1b. Does Drill automatically look up which nodes hold the data in question, or is this specified in the query somehow? 2. Does drill also understand data distribution in HBase? Do queries get sent to nodes that contain the HBase rows in question? 3. We have a custom data store that we'd like to be Drill aware, but want a drillbit on the machine itself. Are there any examples of co-locating drillbits with non-HDFS data sources? 4. If we place files on a bunch of different servers and install drillbits on each one, and we determine which servers contain which files out-of-band, is there a way to submit a query to drill that tells it which nodes contain local files to read? Btw, I would be really interested in chatting /drinking with someone who nows the Drill code well and is based in NYC. Thanks, Wes
