Just noticed that the missing rows are account for under the counter:

org.apache.hadoop.hive.ql.exec.FilterOperator$Counter - FILTERED

Is there any way to print these rows or get more information about why
they are being filtered?

Fabian

On Sat, Aug 11, 2012 at 7:16 PM, Fabian Alenius
<fabian.alen...@gmail.com> wrote:
> Hi,
>
> I'm trying create an external bucketed table but I'm having trouble
> recreating the behavior of the hive partitioner used to create
> internal bucketed tables.
>
> My bucket key is a String s. Currently in my partitioner I'm using the
> follow code which is based on my findings in the Hive codebase:
>
>   (s.hashCode() & Integer.MAX_VALUE) % numPartitions;
>
> Unfortunately, when I do a select count(*) with TABLESAMPLE about 1%
> of the rows are missing from those coming into the mapper.
>
> I suspect that I might need wrap my String in a Writable before
> calling hashCode(). Does anyone know exactly how to partition the data
> so that it becomes compatible with hive bucketing?
>
>
> Regards,
>
> Fabian

Reply via email to