Thanks to all your help I have moved ahead with my project. So I create table as CREATE TABLE test (...) PARTITIONED BY (adid STRING, dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://logs/'
Do a *ALTER TABLE results RECOVER PARTITIONS;* and then start querying. Now the issue is it fetches data from s3 to hdfs for every single query. So if i remove the s3 buckets the result change How can i remove this dependency? Store the data over HDFS and then query it repeatatively. Am I even trying a valid use-case? or am I doing something fundamentally wrong?