Partitioning strings for bucketed table

Fabian Alenius Sat, 11 Aug 2012 10:16:51 -0700

Hi,

I'm trying create an external bucketed table but I'm having trouble
recreating the behavior of the hive partitioner used to create
internal bucketed tables.


My bucket key is a String s. Currently in my partitioner I'm using the
follow code which is based on my findings in the Hive codebase:

  (s.hashCode() & Integer.MAX_VALUE) % numPartitions;

Unfortunately, when I do a select count(*) with TABLESAMPLE about 1%
of the rows are missing from those coming into the mapper.

I suspect that I might need wrap my String in a Writable before
calling hashCode(). Does anyone know exactly how to partition the data
so that it becomes compatible with hive bucketing?


Regards,

Fabian

Partitioning strings for bucketed table

Reply via email to