On Wed, Dec 15, 2010 at 4:52 PM, Mark <[email protected]> wrote: > Can someone explain what partitioning is and why it would be used.. example? > Thanks >
A partition is a physical and logical partition of the data. The query planner can use partitions in the WHERE clause to prune data that hive does not need to process. For example, if you partition your table by day, you can write queries such as SELECT count(1) FROM table where day=20100101. Hive will only use the single partition as input, rather then the entire table. Generally, you do not want to have to many partitions small partitions or too few. http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Add_Partitions
