Thanks Al and Mich. I know about hive partitioning and bucketing.
I have created a table with files in subdirectories and setting these properties: set mapred.input.dir.recursive=true; set hive.mapred.supports.subdirectories=true; makes queries work nicely without partitioning on local vm. Below are few posts on the same topic : http://stackoverflow.com/questions/26767713/can-hive-recursively-descend-into-subdirectories-without-partitions-or-editing-h http://stackoverflow.com/questions/20756561/how-to-pick-up-all-data-into-hive-from-subdirectories https://joshuafennessy.com/2015/06/30/configure-apache-hive-to-recursively-search-directories-for-files/ Solution given in the above posts is to set the two properties mentioned above which I am already setting. But these properties are not working on CDH 5.3.3 cluster, I’m yet to try on other CDH versions. Thanks & Regards, Abhishek Dubey From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Friday, May 20, 2016 1:22 AM To: user <user@hive.apache.org> Subject: Re: Unable to pick data from subdirectories into hive table in CDH 5.3.3 agreed but it still needs to know where the hive top node directory starts from, which is normally under ../../ warehouse Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/> On 19 May 2016 at 20:32, Al Pivonka <alpivo...@gmail.com<mailto:alpivo...@gmail.com>> wrote: Read about hive partitions and bucketing. Since your location has multiple directories, Hive needs to know how to traverse them.. Hope this helps. On May 19, 2016 5:51 AM, "Abhishek Dubey" <abhishek.du...@xoriant.com<mailto:abhishek.du...@xoriant.com>> wrote: Hi, In hdfs I have a directory structure like this. /user/hdfs/Data/Data1/File1 /user/hdfs/Data/Data2/File2 And I am creating an external table like: CREATE external TABLE db.tablename ( amt1 STRING, amt2 STRING, amt3 STRING ) row format delimited fields terminated by ',' location '/user/hdfs/Data/'; Also, I have set two properties: set mapred.input.dir.recursive=true; set hive.mapred.supports.subdirectories=true; This setup is working perfectly fine on my local single node vm, Having all vanilla apache installations and setup, But. on cloudera 5.3.3 cluster of 4 nodes, above mentioned properties for recursive lookup of sub directories for an external hive table is not working. In the Cloudera manager i have added the properties in Hive-Site.xml, deployed configuration and restarted Hive service but still not working. <property> <name> mapred.input.dir.recursive</name> <value>true</value> </property> <property> <name>hive.mapred.supports.subdirectories</name> <value>true</value> </property> When querying select * on CDH What i get is this, Zero rows. hive> select * from tablename; OK Time taken: 0.322 seconds hive> Whereas on local vm it is giving desired output. Is there anything else on CDH that we need to take care to pick data from subdirectories into hive table? Thanks in advance. Abhishek Dubey