Have you tried running against a real file system interface? Or even just against HDFS?
On Thu, Oct 6, 2016 at 12:35 PM, Uwe Korn <[email protected]> wrote: > Hello, > > We had some test runs with Drill 1.8 in the last days and wanted to share > the experience with you as we've made some interesting findings that > astonished us. We did run on our internal company cluster and thus used the > S3 API to access our internal storage cluster, not AWS (the behavior should > still be the same). > > Setup experience: Awesome, it took me less than 30min to have a multimode > Drill setup running on Mesos+Aurora with S3 configured. Really nice. > > Performance with the 1.8 release: Awful. Compared to the queries I ran > locally with Drill on a small dataset, runtimes were magnitudes higher than > on my laptop. After some debugging, I saw that hadoop-s3a is always > requesting via S3 the byte range from the position we want to start to read > until the end of the file. This gave the following HTTP pattern: > * GET bytes=8k-100M > * GET bytes=2M-100M > * GET bytes=4M-100M > Although the HTTP request were normally aborted before all the data was > send by the server, it was still about 10-15x the size of the input files > that went over the network. Using Parquet, I actually hoped to achieve the > opposite, i.e. that less the whole file was transferred (my test queries > were only using 2 of 15 columns). > > In Hadoop 3.0.0-alpha1 [2], there are a lot of improvements w.r.t. S3 > access. You can now select via fs.s3a.experimental.input.fadvise=random a > new reading mode that will only request via S3 the asked range plus a small > readahead buffer. While this keeps the number of requests constant, we now > only request the actual data we need. With that, performance is not amazing > but in an acceptable range. > > Still query planning always took at least 35s. This was an effect of > fs.s3a.experimental.input.fadvise=random. While the Parquet access is > specifying really good which ranges it wants to read, the parser for the > metadata cache actually only request 8000 bytes at once and thus lead to > several thousand HTTP requests for a single sequential read. As a > workaround, we have added a call to FSDataInputStream. > setReadahead(metadata-filesize) to limit the access to a single request. > This brought reading metadata down to 3s. > > Another problem with the metadata cache was, that it actually was rebuild > on every query. Drill relies here on the change timestamp of the directory > which is not support by S3 [1] and thus always the current time was > returned as the modification date of the directory. > > These were just our initial, basic findings with Drill. At the moment it > looks promising enough so that we will probably do some more usability and > performance testing. If we already did something wrong with the initial S3 > tests, it would be nice to get to know some pointers what it could have > been. The bad S3 I/O performance was really surprising for us. > > Kind regards, > Uwe > > [1] https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/ > tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_ > modification_times_of_directories <https://hadoop.apache.org/ > docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html# > Warning_2:_Because_Object_stores_dont_track_modification_times_of_ > directories> > [2] From here on, the tests were made with > Drill-master+hadoop-3.0.0-alpha1+aws-sdk-1.11.35, > i.e. custom Drill and Hadoop builds to have dependencies in newer versions.
