Hi Uwe, Can you log JIRA's for the performance issues that you encounter while working on S3? Not many folks are working on optimizing that path, so any patches that you might be able to contribute would be appreciated.
Parth On Thu, Oct 6, 2016 at 1:56 PM, Uwe Korn <[email protected]> wrote: > Yes. Performance was much better with a real file system (i.e. I ran > locally on my laptop using the SSD installed there). I don’t expect to have > the exact same performance with S3 as I don’t have things like data > locality there. My use case is mainly querying „cold“ datasets, i.e. ones > that are not touched often and when, only a few queries are done on them. > > > > Am 06.10.2016 um 22:47 schrieb Ted Dunning <[email protected]>: > > > > Have you tried running against a real file system interface? Or even just > > against HDFS? > > > > > > > > On Thu, Oct 6, 2016 at 12:35 PM, Uwe Korn <[email protected] <mailto: > [email protected]>> wrote: > > > >> Hello, > >> > >> We had some test runs with Drill 1.8 in the last days and wanted to > share > >> the experience with you as we've made some interesting findings that > >> astonished us. We did run on our internal company cluster and thus used > the > >> S3 API to access our internal storage cluster, not AWS (the behavior > should > >> still be the same). > >> > >> Setup experience: Awesome, it took me less than 30min to have a > multimode > >> Drill setup running on Mesos+Aurora with S3 configured. Really nice. > >> > >> Performance with the 1.8 release: Awful. Compared to the queries I ran > >> locally with Drill on a small dataset, runtimes were magnitudes higher > than > >> on my laptop. After some debugging, I saw that hadoop-s3a is always > >> requesting via S3 the byte range from the position we want to start to > read > >> until the end of the file. This gave the following HTTP pattern: > >> * GET bytes=8k-100M > >> * GET bytes=2M-100M > >> * GET bytes=4M-100M > >> Although the HTTP request were normally aborted before all the data was > >> send by the server, it was still about 10-15x the size of the input > files > >> that went over the network. Using Parquet, I actually hoped to achieve > the > >> opposite, i.e. that less the whole file was transferred (my test queries > >> were only using 2 of 15 columns). > >> > >> In Hadoop 3.0.0-alpha1 [2], there are a lot of improvements w.r.t. S3 > >> access. You can now select via fs.s3a.experimental.input.fadvise=random > a > >> new reading mode that will only request via S3 the asked range plus a > small > >> readahead buffer. While this keeps the number of requests constant, we > now > >> only request the actual data we need. With that, performance is not > amazing > >> but in an acceptable range. > >> > >> Still query planning always took at least 35s. This was an effect of > >> fs.s3a.experimental.input.fadvise=random. While the Parquet access is > >> specifying really good which ranges it wants to read, the parser for the > >> metadata cache actually only request 8000 bytes at once and thus lead to > >> several thousand HTTP requests for a single sequential read. As a > >> workaround, we have added a call to FSDataInputStream. > >> setReadahead(metadata-filesize) to limit the access to a single > request. > >> This brought reading metadata down to 3s. > >> > >> Another problem with the metadata cache was, that it actually was > rebuild > >> on every query. Drill relies here on the change timestamp of the > directory > >> which is not support by S3 [1] and thus always the current time was > >> returned as the modification date of the directory. > >> > >> These were just our initial, basic findings with Drill. At the moment it > >> looks promising enough so that we will probably do some more usability > and > >> performance testing. If we already did something wrong with the initial > S3 > >> tests, it would be nice to get to know some pointers what it could have > >> been. The bad S3 I/O performance was really surprising for us. > >> > >> Kind regards, > >> Uwe > >> > >> [1] https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/ > >> tools/hadoop-aws/index.html#Warning_2:_Because_Object_ > stores_dont_track_ > >> modification_times_of_directories <https://hadoop.apache.org/ < > https://hadoop.apache.org/> > >> docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html# > >> Warning_2:_Because_Object_stores_dont_track_modification_times_of_ > >> directories> > >> [2] From here on, the tests were made with Drill-master+hadoop-3.0.0- > alpha1+aws-sdk-1.11.35, > >> i.e. custom Drill and Hadoop builds to have dependencies in newer > versions. > >
