Hi, Thank you.
I was not suggesting this to be a part of Drill, only asking if any experience exist in this area. :) I'm trying to evaluate S3-almost-only vs. HDFS so your points are handy. Regards, -Stefan On Tue, Jul 14, 2015 at 5:08 PM, Jason Altekruse <[email protected]> wrote: > I am not aware of anyone doing something like this today, but it seems like > something best handled outside of Drill right now. Drill considers itself > essentially stateless, we do not manage indexes, table constraints or > caching data for any of our current storage systems. There was some work > being done to cache Parquet metadata, in this case we were placing all of > the parquet footers in a single file, which would need to be manually > refreshed. This work has not made it into the mainline, but you can follow > the progress here: > > https://issues.apache.org/jira/browse/DRILL-2743 > > I would take a look around for general purpose local caching systems for > S3. To make these work with Drill today they will have to re-expose the > HDFS API. There might be something out there that already does this, but as > some of the primary users of S3 are web application developers, they might > not have worried about providing the HDFS API on top of any caching systems > developed to date. > > One thing to note, the HDFS API is already available on top of the local > file system, this is what enables us to read from the local disk in > embedded mode. If you can get a caching system to expose NFS, you could > mount this to the same path on all of your nodes and it should be able to > read from that path mounted on your local FS. > > > > On Tue, Jul 14, 2015 at 1:06 AM, Stefán Baxter <[email protected]> > wrote: > > > Hi, > > > > I'm wondering if the people that use Drill with S3 are using some sort of > > local cache on the drillbit-nodes for historical, non changing, Parquet > > segments. > > > > I'm pretty sure that I'm not using the correct terminology and that the > > correct question is this: Are there any ways to optimize S3 with drill so > > that "hot segments" are stored locally while hot and then just dropped > from > > local nodes when they are not. > > > > I guess this only really matters where networking speeds between the > > drill-bit nodes and S3 is not optimal. > > > > Regards, > > -Stefan > > >
