Re: Reading ORC Files from S3

Gopal Vijayaraghavan Mon, 28 Sep 2015 15:15:08 -0700

> avail.  I was hoping perhaps someone on the list here might
> be able to shed some light as to why we're having these problems and/or
>have some suggestions on how we might be able to work around them.
...
>  (I.e., theoretically ORC should be able to skip reading large portions
>of the index files by jumping directly to the index
> records that match the supplied search criteria. (Or at least jumping to
>a stripe close to them.))  But this is proving not to be the case.


Not theoretically. ORC does that and that's the issue.

S3n is badly broken for a columnar format & even S3A is missing a couple
of features which are essential to get read performance over HTTP.

Here's one example - every seek() disconnects & restablishes an SSL
connection in S3 (that fix is a ~2x perf increase for S3a).

https://issues.apache.org/jira/browse/HADOOP-12444


In another scenario we found that a readFully(colOffset,... colSize) will
open an unbounded reader in S3n instead of reading the fixed chunk off
HTTP.

https://issues.apache.org/jira/browse/HADOOP-11867


The lack of this means that even the short-live keep-alive gets turned off
by the S3 impl, when doing a forward-seek read pattern, because it is a
recv buffer-dropping disconnect, not a complete request.

The Amazon proprietary S3 drivers are not subject to these problems, so
they work with ORC very well. It's the open source S3 filesystem impls
which are holding us back.

> Is ORC simply unable to work efficiently against data stored on S3n?
>(I.e., due to network round-trips taking too long.)

S3n is unable to handle any columnar format efficiently - it fires an HTTP
GET for each seek, marked till end of the file. Any format which requires
forward seeks or bounded readers is going to die via TCP window &
round-trip thrashing.


I know what's needed for s3a to work well with columnar readers
(Parquet/ORC/RCFile included) and future proof it so that it will work
fine when HTTP/2 arrives.

If you're interested in being guinea pig for S3a fixes, it is currently
sitting on my back burner (I'm not a hadoop committer) - the FS fixes are
about two weeks worth of work for a single motivated dev.

Cheers,
Gopal

Re: Reading ORC Files from S3

Reply via email to