> avail. I was hoping perhaps someone on the list here might > be able to shed some light as to why we're having these problems and/or >have some suggestions on how we might be able to work around them. ... > (I.e., theoretically ORC should be able to skip reading large portions >of the index files by jumping directly to the index > records that match the supplied search criteria. (Or at least jumping to >a stripe close to them.)) But this is proving not to be the case.
Not theoretically. ORC does that and that's the issue. S3n is badly broken for a columnar format & even S3A is missing a couple of features which are essential to get read performance over HTTP. Here's one example - every seek() disconnects & restablishes an SSL connection in S3 (that fix is a ~2x perf increase for S3a). https://issues.apache.org/jira/browse/HADOOP-12444 In another scenario we found that a readFully(colOffset,... colSize) will open an unbounded reader in S3n instead of reading the fixed chunk off HTTP. https://issues.apache.org/jira/browse/HADOOP-11867 The lack of this means that even the short-live keep-alive gets turned off by the S3 impl, when doing a forward-seek read pattern, because it is a recv buffer-dropping disconnect, not a complete request. The Amazon proprietary S3 drivers are not subject to these problems, so they work with ORC very well. It's the open source S3 filesystem impls which are holding us back. > Is ORC simply unable to work efficiently against data stored on S3n? >(I.e., due to network round-trips taking too long.) S3n is unable to handle any columnar format efficiently - it fires an HTTP GET for each seek, marked till end of the file. Any format which requires forward seeks or bounded readers is going to die via TCP window & round-trip thrashing. I know what's needed for s3a to work well with columnar readers (Parquet/ORC/RCFile included) and future proof it so that it will work fine when HTTP/2 arrives. If you're interested in being guinea pig for S3a fixes, it is currently sitting on my back burner (I'm not a hadoop committer) - the FS fixes are about two weeks worth of work for a single motivated dev. Cheers, Gopal
