Hi,

I'm trying create an external bucketed table but I'm having trouble
recreating the behavior of the hive partitioner used to create
internal bucketed tables.

My bucket key is a String s. Currently in my partitioner I'm using the
follow code which is based on my findings in the Hive codebase:

  (s.hashCode() & Integer.MAX_VALUE) % numPartitions;

Unfortunately, when I do a select count(*) with TABLESAMPLE about 1%
of the rows are missing from those coming into the mapper.

I suspect that I might need wrap my String in a Writable before
calling hashCode(). Does anyone know exactly how to partition the data
so that it becomes compatible with hive bucketing?


Regards,

Fabian

Reply via email to