Just noticed that the missing rows are account for under the counter: org.apache.hadoop.hive.ql.exec.FilterOperator$Counter - FILTERED
Is there any way to print these rows or get more information about why they are being filtered? Fabian On Sat, Aug 11, 2012 at 7:16 PM, Fabian Alenius <fabian.alen...@gmail.com> wrote: > Hi, > > I'm trying create an external bucketed table but I'm having trouble > recreating the behavior of the hive partitioner used to create > internal bucketed tables. > > My bucket key is a String s. Currently in my partitioner I'm using the > follow code which is based on my findings in the Hive codebase: > > (s.hashCode() & Integer.MAX_VALUE) % numPartitions; > > Unfortunately, when I do a select count(*) with TABLESAMPLE about 1% > of the rows are missing from those coming into the mapper. > > I suspect that I might need wrap my String in a Writable before > calling hashCode(). Does anyone know exactly how to partition the data > so that it becomes compatible with hive bucketing? > > > Regards, > > Fabian