Thank you Alan, Prasanth and Gopal.

Yes, select * was a bad example, but if you watch users it is always the
first thing they do ;-) Generally though it's revenue by area, minutes by
product, etc. which is perfect for columnar data.

Tez is not officially supported on EMR yet but they provide bootstraps for
Tez 0.7. I've been using it for a month (via Hue 3.7.1) and it's absolutely
fine.

I've recreated my table as s3a and am currently inserting 670m records into
it. Will feedback but it sounds like I will need to wait for AWS to move to
Hive 2.0 to get the full benefits.

Regards
Neil

On 10 November 2015 at 19:24, Gopal Vijayaraghavan <[email protected]>
wrote:

> Hi,
>
> >
> >http://mail-archives.apache.org/mod_mbox/orc-user/201509.mbox/%3c560AB8D2
> .
> >[email protected]%3e
> ...
> > ORC does a lot of seeks inside its files in order to only load the data
> >you need.  S3 doesn't handle seeks well, so ORC does not give you the
> >same improvements that you would see using it on HDFS directly.
>
> ORC changed the way it generates seeks recently in hive-2.0, to get
> connection re-use working (HIVE-11945).
>
> The S3A drivers still need to be fixed to handle seeks via HTTP range
> requests (HADOOP-12444), but the EMR drivers are better at it I think.
>
> > select * from test where subscriber_id = '12345678'
>
> Are the filter columns strings?
>
>
> I think the version you're running doesn't have bloom filter indexes,
> which is somewhat necessary for strings (since a uniformly distributed 1
> byte prefix effectively ruins regular index lookups).
>
>
> You can work around that issue by laying out data in order to get a tight
> grouping
>
> insert overwrite table test as select * from test sort by subscriber_id;
> -- sort by, not order by
>
> Also "select *" is a corner case, since it doesn't get you any benefit of
> the columnar layout, since all columns are being read.
>
> > We are using Tez on EMR 4.1 (which uses Hive 1.0, I believe).
>
>
> Wow, I did not know this. I will try this.
>
> Cheers,
> Gopal
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


-- 
Regards
Neil

Reply via email to