Have you tried running against a real file system interface? Or even just
against HDFS?



On Thu, Oct 6, 2016 at 12:35 PM, Uwe Korn <[email protected]> wrote:

> Hello,
>
> We had some test runs with Drill 1.8 in the last days and wanted to share
> the experience with you as we've made some interesting findings that
> astonished us. We did run on our internal company cluster and thus used the
> S3 API to access our internal storage cluster, not AWS (the behavior should
> still be the same).
>
> Setup experience: Awesome, it took me less than 30min to have a multimode
> Drill setup running on Mesos+Aurora with S3 configured. Really nice.
>
> Performance with the 1.8 release: Awful. Compared to the queries I ran
> locally with Drill on a small dataset, runtimes were magnitudes higher than
> on my laptop. After some debugging, I saw that hadoop-s3a is always
> requesting via S3 the byte range from the position we want to start to read
> until the end of the file. This gave the following HTTP pattern:
>  * GET bytes=8k-100M
>  * GET bytes=2M-100M
>  * GET bytes=4M-100M
> Although the HTTP request were normally aborted before all the data was
> send by the server, it was still about 10-15x the size of the input files
> that went over the network. Using Parquet, I actually hoped to achieve the
> opposite, i.e. that less the whole file was transferred (my test queries
> were only using 2 of 15 columns).
>
> In Hadoop 3.0.0-alpha1 [2], there are a lot of improvements w.r.t. S3
> access. You can now select via fs.s3a.experimental.input.fadvise=random a
> new reading mode that will only request via S3 the asked range plus a small
> readahead buffer. While this keeps the number of requests constant, we now
> only request the actual data we need. With that, performance is not amazing
> but in an acceptable range.
>
> Still query planning always took at least 35s. This was an effect of
> fs.s3a.experimental.input.fadvise=random. While the Parquet access is
> specifying really good which ranges it wants to read, the parser for the
> metadata cache actually only request 8000 bytes at once and thus lead to
> several thousand HTTP requests for a single sequential read. As a
> workaround, we have added a call to FSDataInputStream.
> setReadahead(metadata-filesize) to limit the access to a single request.
> This brought reading metadata down to 3s.
>
> Another problem with the metadata cache was, that it actually was rebuild
> on every query. Drill relies here on the change timestamp of the directory
> which is not support by S3 [1] and thus always the current time was
> returned as the modification date of the directory.
>
> These were just our initial, basic findings with Drill. At the moment it
> looks promising enough so that we will probably do some more usability and
> performance testing. If we already did something wrong with the initial S3
> tests, it would be nice to get to know some pointers what it could have
> been. The bad S3 I/O performance was really surprising for us.
>
> Kind regards,
> Uwe
>
> [1] https://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-aws/
> tools/hadoop-aws/index.html#Warning_2:_Because_Object_stores_dont_track_
> modification_times_of_directories <https://hadoop.apache.org/
> docs/r3.0.0-alpha1/hadoop-aws/tools/hadoop-aws/index.html#
> Warning_2:_Because_Object_stores_dont_track_modification_times_of_
> directories>
> [2] From here on, the tests were made with 
> Drill-master+hadoop-3.0.0-alpha1+aws-sdk-1.11.35,
> i.e. custom Drill and Hadoop builds to have dependencies in newer versions.

Reply via email to