If you calculate the size of the bags, you can use this value as a
scalar and divide it by the number of bags you want, and round.

Don't ask me to write that code though :)

Russell Jurney
twitter.com/rjurney
[email protected]
datasyndrome.com

On Apr 11, 2012, at 9:11 AM, Dan Feldman <[email protected]> wrote:

> Hey James,
>
> Have you looked at linkedIn's collection of UDFs, datafu (
> http://engineering.linkedin.com/open-source/introducing-datafu-open-source-collection-useful-apache-pig-udfs
> )?
>
> In particular, they have a UDF called BagSplit (
> https://github.com/linkedin/datafu/blob/master/src/java/datafu/pig/bags/BagSplit.java).
> It might not do exactly what you want since it splits a bag into bags of
> size n, not into 10 equal-sized bags, but it shouldn't be too hard to write
> your own UDF using BagSplit.java as a reference.
>
> Dan F.
>
>
>
> On Wed, Apr 11, 2012 at 8:53 AM, James Newhaven 
> <[email protected]>wrote:
>
>> Hi,
>>
>> I need to divide a large bag into 10 smaller bags of equal size. Does
>> anyone know of a function that can do this easily? I've had a look at the
>> standard functions and the PiggyBank and can't find anything appropriate.
>>
>> Thanks,
>> James
>>

Reply via email to