Thanks for the help with this. Thanks, Ranjith
From: Bejoy Ks [mailto:[email protected]] Sent: Friday, December 16, 2011 04:41 AM To: [email protected] <[email protected]> Subject: Re: bucketing in hive Ranjith You can definitely change the number of buckets in a hive table even after its creation. You need to issue an alter table command that contains the CLUSTERED BY and/or SORTED BY clauses used by your table. For example if I have a table whose DDL looks like this CREATE EXTERNAL TABLE employee ( emp_id STRING, emp_name STRING, dept STRING, location STRING, ) CLUSTERED BY(dept,location) SORTED BY(dept,location) INTO 15 BUCKETS ; You can ALTER the number of BUCKETS using the ALTER TABLE command as ALTER TABLE employee CLUSTERED BY(dept,location) SORTED BY(dept,location) INTO 20 BUCKETS ; The one major factor you need to consider here is that if you are using sampling queries on a partitioned - bucketed tables, you need to keep in mind that the older partitions may have different number of buckets where as the new partitions after the ALTER statement would have a different number of buckets. Hope it helps!... Regards Bejoy.K.S ________________________________ From: "Raghunath, Ranjith" <[email protected]> To: "'[email protected]'" <[email protected]>; "'[email protected]'" <[email protected]> Sent: Friday, December 16, 2011 10:48 AM Subject: Re: bucketing in hive Thanks Bejoy. Appreciate the insight. Do you know of altering the number of buckets once a table has been set up? Thanks, Ranjith From: Bejoy Ks [mailto:[email protected]] Sent: Thursday, December 15, 2011 06:13 AM To: [email protected] <[email protected]>; hive dev list <[email protected]> Subject: Re: bucketing in hive Hi Ranjith I'm not aware of any Dynamic Bucketing in hive where as there is definitely Dynamic Partitions available. Your partitions/sub partitions would be generated on the fly/dynamically based on the value of a particular column .The records with same values for that column would go into the same partition. But Dynamic Partition load can't happen with a LOAD DATA statement as it requires running mapreduce job, You can utilize dynamic partitions in 2 steps for delimited files - Load delimited file into a non partitioned table in hive using LOAD DATA - Load data into destination table from the source table using INSERT OVERWRITE - here a MR job would be triggered that would do the job for you. I have scribbled something down on the same, check whether it'd be useful for you. http://kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html Regards Bejoy.K.S ________________________________ From: "Raghunath, Ranjith" <[email protected]> To: "[email protected]" <[email protected]>; hive dev list <[email protected]> Sent: Thursday, December 15, 2011 7:53 AM Subject: bucketing in hive Can one use bucketing in hive to emulate hash partitions on a database? Is there also a way to segment data into buckets dynamically based on values in the column. For example, Col1 Col2 Apple 1 Orange 2 Apple 2 Banana 1 If the file above were inserted into a table with Col1 as the bucket column, can we dynamically allow all of the rows with “Apple” in one file and “Orange” in one file and so on. Is there a way to do this without specifying the bucket size to be 3. Thank you, Ranjith
