Thanks to all your help I have moved ahead with my project.
So I create table as
CREATE TABLE test (...)
PARTITIONED BY (adid STRING, dt STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://logs/'

Do a  *ALTER TABLE results RECOVER PARTITIONS;*

and then start querying.

Now the issue is it fetches data from s3 to hdfs for every single query. So
if i remove the s3 buckets the result change

How can i remove this dependency? Store the data over HDFS and then query
it repeatatively.

Am I even trying a valid use-case? or am I doing something fundamentally
wrong?

Reply via email to