Yes. Performance was much better with a real file system (i.e. I ran locally on my laptop using the SSD installed there). I don’t expect to have the exact same performance with S3 as I don’t have things like data locality there. My use case is mainly querying „cold“ datasets, i.e. ones that are not touched often and when, only a few queries are done on them.
> Am 06.10.2016 um 22:47 schrieb Ted Dunning <[email protected]>: > > Have you tried running against a real file system interface? Or even just > against HDFS? > > > > On Thu, Oct 6, 2016 at 12:35 PM, Uwe Korn <[email protected] > <mailto:[email protected]>> wrote: > >> Hello, >> >> We had some test runs with Drill 1.8 in the last days and wanted to share >> the experience with you as we've made some interesting findings that >> astonished us. We did run on our internal company cluster and thus used the >> S3 API to access our internal storage cluster, not AWS (the behavior should >> still be the same). >> >> Setup experience: Awesome, it took me less than 30min to have a multimode >> Drill setup running on Mesos+Aurora with S3 configured. Really nice. >> >> Performance with the 1.8 release: Awful. Compared to the queries I ran >> locally with Drill on a small dataset, runtimes were magnitudes higher than >> on my laptop. After some debugging, I saw that hadoop-s3a is always >> requesting via S3 the byte range from the position we want to start to read >> until the end of the file. This gave the following HTTP pattern: >> * GET bytes=8k-100M >> * GET bytes=2M-100M >> * GET bytes=4M-100M >> Although the HTTP request were normally aborted before all the data was >> send by the server, it was still about 10-15x the size of the input files >> that went over the network. Using Parquet, I actually hoped to achieve the >> opposite, i.e. that less the whole file was transferred (my test queries >> were only using 2 of 15 columns). >> >> In Hadoop 3.0.0-alpha1 [2], there are a lot of improvements w.r.t. S3 >> access. You can now select via fs.s3a.experimental.input.fadvise=random a >> new reading mode that will only request via S3 the asked range plus a small >> readahead buffer. While this keeps the number of requests constant, we now >> only request the actual data we need. With that, performance is not amazing >> but in an acceptable range. >> >> Still query planning always took at least 35s. This was an effect of >> fs.s3a.experimental.input.fadvise=random. While the Parquet access is >> specifying really good which ranges it wants to read, the parser for the >> metadata cache actually only request 8000 bytes at once and thus lead to >> several thousand HTTP requests for a single sequential read. As a >> workaround, we have added a call to FSDataInputStream. >> setReadahead(metadata-filesize) to limit the access to a single request. >> This brought reading metadata down to 3s. >> >> Another problem with the metadata cache was, that it actually was rebuild >> on every query. Drill relies here on the change timestamp of the directory >> which is not support by S3 [1] and thus always the current time was >> returned as the modification date of the directory. >> >> These were just our initial, basic findings with Drill. At the moment it >> looks promising enough so that we will probably do some more usability and >> performance testing. If we already did something wrong with the initial S3 >> tests, it would be nice to get to know some pointers what it could have >> been. The bad S3 I/O performance was really surprising for us. >> >> Kind regards, >> Uwe >> >> [1] https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/ >> tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_ >> modification_times_of_directories <https://hadoop.apache.org/ >> <https://hadoop.apache.org/> >> docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html# >> Warning_2:_Because_Object_stores_dont_track_modification_times_of_ >> directories> >> [2] From here on, the tests were made with >> Drill-master+hadoop-3.0.0-alpha1+aws-sdk-1.11.35, >> i.e. custom Drill and Hadoop builds to have dependencies in newer versions.
