Hi Uwe, This is lot of good information. We should document it in a JIRA.
BTW, I just checked and apparently, Hadoop 2.8.2 is released recently, which they claim is the first GA release. I think we can attempt to move to Hadoop 2.8.2 after Drill 1.12 is released. Yes, some unit tests were failing last time I tried 2.8.1. But, I think we can fix them. Thanks Padma > On Nov 5, 2017, at 8:27 AM, Uwe L. Korn <[email protected]> wrote: > > Hello Charles, > > I ran into the same performance issues some time ago and did make some > discoveries: > > * Drill is good at only pulling the byte ranges out of the file system > it needs. Sadly, s3a in Hadoop 2.7 is translating a request to the byte > range (x,y) into a HTTP request to S3 of the byte range > (x,end-of-file). In the case of Parquet, this means that you will read > for each column in each row group from the beginning of this column > chunk to the end of the file. Overall this amounted for me for a > traffic of 10-20x the size of the actual file in total. > * Hadoop 2.8/3.0 actually introduces a new S3 experimental random > access mode that really improves performance as this will only send > requests of (x, y+readahead.range) to S3. You can activate it with > fs.s3a.experimental.input.fadvise=random. > * I played a bit with fs.s3a.readahead.range which is optimistic range > that is included in the request but actually found that I could keep it > at its default of 65536 bytes as Drill often requests all bytes it > needs at once and thus reading ahead did not improve the situation. > * This random access mode plays well with Parquet files but sadly > slowed down the read of the metadata cache drastically as only requests > of the size 65540 were done to S3. Therefore I had to add > is.setReadahead(filesize); after > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L593 > to ensure that the metadata cache is read at once from S3. > * Also > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/Metadata.java#L662 > seem to have been always true in my case, causing a refresh of the > cache on every query. As I had quite a big dataset, this added a large > constant to my query. This might be simply due to the fact that S3 does > not have the concept of directories. I have not digged deeper into this > but added as a dirty workaround that once the cache exists, it is never > updated automatically. > > Locally I have made my own Drill build based on the Hadoop 2.8 libraries > but sadly some unit tests failed, at least for the S3 testing, > everything seems to work. Work is still on the 1.11 release sources and > some code has changed since then. I will have some time in the next > days/weeks to look again at this and might open some PRs (don't expect > me to be the one to open the Hadoop-Update PR, I'm a full-time Python > dev, so this is a bit out of my comfort zone :D ). At least for my > basic tests, this resulted in a quite performant setup for me (embedded > and in distributed mode). > > Cheers > Uwe > > On Sun, Nov 5, 2017, at 02:29 AM, Charles Givre wrote: >> Hello everyone, >> I’m experimenting with Drill on S3 and I’ve been pretty disappointed with >> the performance. I’m curious as to what kind of performance I can >> expect? Also what can be done to improve performance on S3. My current >> config is I am using Drill in embedded mode with a corporate S3 bucket. >> Thanks, >> — C
