Hi Mich, I want to know If we can drop data of particular bucket in hive
On Friday, August 19, 2016, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hash partitioning (Bucketing) does not make much sense for YYYY/MM/DD/32 > as pointed out. > > So it is clear that with (mod 32), the maximum number of offsets is going > to be 32, i.e. in the range between 0-31. With YYYY/MM/DD you have to > account for hash collisions as well. The set of inputs is potentially many > (definitely not known until we encounter them all) and if you want to > spread them evenly (after all that is what hash partitioning is all about) > then I think day of the month makes more sense. > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 19 August 2016 at 23:15, Gopal Vijayaraghavan <gop...@apache.org > <javascript:_e(%7B%7D,'cvml','gop...@apache.org');>> wrote: > >> >> > We are bucketing by date so we wil have max 32 buckets >> >> If you do want to lookup specifically by date, you could just create day >> partitions and never partition by month. >> >> FYI, in a modern version of Hive >> >> select count(1) from table where YEAR(dt) = 2016 and MONTH(dt) = 12 >> >> does prune it on the client side. >> >> On a different note, 31 buckets is a bad idea (32 is ok), because for >> String hashes (32-1) is the magic number which hurts "yyyymmdd" and 50% of >> your buckets have 0 data. >> >> http://www.slideshare.net/t3rmin4t0r/data-organization-hive-meetup/6 >> >> >> Use that as a number and you'll get the same number back as the hashcode, >> so it won't be stable as months change (20160816 % 32 == 16 and 20160716 % >> 32 == 12). >> >> The only way to have buckets correspond to a day_of_month as an int and >> bucket on it with 32 - then bucket0 == 31, bucket1=1, bucket2=2 etc. >> >> Cheers, >> Gopal >> >> >> >