Re: Hive on EMR on S3 : Beginner

Ravi Shetye Mon, 27 Aug 2012 05:58:28 -0700

Thanks to all your help I have moved ahead with my project.
So I create table as
CREATE TABLE test (...)
PARTITIONED BY (adid STRING, dt STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://logs/'


Do a  *ALTER TABLE results RECOVER PARTITIONS;*

and then start querying.

Now the issue is it fetches data from s3 to hdfs for every single query. So
if i remove the s3 buckets the result change

How can i remove this dependency? Store the data over HDFS and then query
it repeatatively.

Am I even trying a valid use-case? or am I doing something fundamentally
wrong?

Re: Hive on EMR on S3 : Beginner

Reply via email to