Hi, I am trying to figure s strategy around partitions in hive. I'm thinking either a monthly or a daily partition. The usage directs me go towards the daily partition scheme(querying etc), but I'm not sure what would be the HDFS, Name Node limitations to this.
If for a daily partition I would have 3-4 GB of file in each partition and for 2 years I might end up having 700 and odd directories with one file each. On the contrary in monthly I would have 24 directories with each directory having 30 or 31 files of 4 GB each. Most of my queries are in the date range and I was thinking daily partitions would be more effective as it doesn't have to scan all the files for the month in case of a monthly partition. I would like to know what other considerations should I think about before making a decision. 1) Name node/ HDFS limitations 2) Archiving files 3) compression and may be more. I would really appreciate any inputs on this Thanks Kishore
