Thank you Alan, Prasanth and Gopal. Yes, select * was a bad example, but if you watch users it is always the first thing they do ;-) Generally though it's revenue by area, minutes by product, etc. which is perfect for columnar data.
Tez is not officially supported on EMR yet but they provide bootstraps for Tez 0.7. I've been using it for a month (via Hue 3.7.1) and it's absolutely fine. I've recreated my table as s3a and am currently inserting 670m records into it. Will feedback but it sounds like I will need to wait for AWS to move to Hive 2.0 to get the full benefits. Regards Neil On 10 November 2015 at 19:24, Gopal Vijayaraghavan <[email protected]> wrote: > Hi, > > > > >http://mail-archives.apache.org/mod_mbox/orc-user/201509.mbox/%3c560AB8D2 > . > >[email protected]%3e > ... > > ORC does a lot of seeks inside its files in order to only load the data > >you need. S3 doesn't handle seeks well, so ORC does not give you the > >same improvements that you would see using it on HDFS directly. > > ORC changed the way it generates seeks recently in hive-2.0, to get > connection re-use working (HIVE-11945). > > The S3A drivers still need to be fixed to handle seeks via HTTP range > requests (HADOOP-12444), but the EMR drivers are better at it I think. > > > select * from test where subscriber_id = '12345678' > > Are the filter columns strings? > > > I think the version you're running doesn't have bloom filter indexes, > which is somewhat necessary for strings (since a uniformly distributed 1 > byte prefix effectively ruins regular index lookups). > > > You can work around that issue by laying out data in order to get a tight > grouping > > insert overwrite table test as select * from test sort by subscriber_id; > -- sort by, not order by > > Also "select *" is a corner case, since it doesn't get you any benefit of > the columnar layout, since all columns are being read. > > > We are using Tez on EMR 4.1 (which uses Hive 1.0, I believe). > > > Wow, I did not know this. I will try this. > > Cheers, > Gopal > > > > > > > > > > > > > > > > > > > > > > -- Regards Neil
