Hello, We had some test runs with Drill 1.8 in the last days and wanted to share the experience with you as we've made some interesting findings that astonished us. We did run on our internal company cluster and thus used the S3 API to access our internal storage cluster, not AWS (the behavior should still be the same).
Setup experience: Awesome, it took me less than 30min to have a multimode Drill setup running on Mesos+Aurora with S3 configured. Really nice. Performance with the 1.8 release: Awful. Compared to the queries I ran locally with Drill on a small dataset, runtimes were magnitudes higher than on my laptop. After some debugging, I saw that hadoop-s3a is always requesting via S3 the byte range from the position we want to start to read until the end of the file. This gave the following HTTP pattern: * GET bytes=8k-100M * GET bytes=2M-100M * GET bytes=4M-100M Although the HTTP request were normally aborted before all the data was send by the server, it was still about 10-15x the size of the input files that went over the network. Using Parquet, I actually hoped to achieve the opposite, i.e. that less the whole file was transferred (my test queries were only using 2 of 15 columns). In Hadoop 3.0.0-alpha1 [2], there are a lot of improvements w.r.t. S3 access. You can now select via fs.s3a.experimental.input.fadvise=random a new reading mode that will only request via S3 the asked range plus a small readahead buffer. While this keeps the number of requests constant, we now only request the actual data we need. With that, performance is not amazing but in an acceptable range. Still query planning always took at least 35s. This was an effect of fs.s3a.experimental.input.fadvise=random. While the Parquet access is specifying really good which ranges it wants to read, the parser for the metadata cache actually only request 8000 bytes at once and thus lead to several thousand HTTP requests for a single sequential read. As a workaround, we have added a call to FSDataInputStream.setReadahead(metadata-filesize) to limit the access to a single request. This brought reading metadata down to 3s. Another problem with the metadata cache was, that it actually was rebuild on every query. Drill relies here on the change timestamp of the directory which is not support by S3 [1] and thus always the current time was returned as the modification date of the directory. These were just our initial, basic findings with Drill. At the moment it looks promising enough so that we will probably do some more usability and performance testing. If we already did something wrong with the initial S3 tests, it would be nice to get to know some pointers what it could have been. The bad S3 I/O performance was really surprising for us. Kind regards, Uwe [1] https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_modification_times_of_directories <https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_modification_times_of_directories> [2] From here on, the tests were made with Drill-master+hadoop-3.0.0-alpha1+aws-sdk-1.11.35, i.e. custom Drill and Hadoop builds to have dependencies in newer versions.
