Hi, I'm trying create an external bucketed table but I'm having trouble recreating the behavior of the hive partitioner used to create internal bucketed tables.
My bucket key is a String s. Currently in my partitioner I'm using the follow code which is based on my findings in the Hive codebase: (s.hashCode() & Integer.MAX_VALUE) % numPartitions; Unfortunately, when I do a select count(*) with TABLESAMPLE about 1% of the rows are missing from those coming into the mapper. I suspect that I might need wrap my String in a Writable before calling hashCode(). Does anyone know exactly how to partition the data so that it becomes compatible with hive bucketing? Regards, Fabian