Correction in my earlier comment. The following statement that I wrote was wrong: 'Won't SPLIT always give you 2 relations?'
It is basically what Praveenesh himself mentioned i.e. a pre-defined/known number of relations/splits. Regards, Shahab On Sun, Sep 15, 2013 at 7:41 PM, praveenesh kumar <[email protected]>wrote: > I can use split only when I am aware of the values by which I need to split > by... Here customer_ids are unknown to me. I don't know how many of them > exist in my data. Hence SPLIT is not the answer to my problem. > > Anyways I have found piggybank's MultiStorage method much closer to what I > am looking for. I was just wondering is there a better or different way to > do the same. > > Regards > Praveenesh > > > On Mon, Sep 16, 2013 at 12:36 AM, Ruslan Al-Fakikh <[email protected] > >wrote: > > > Hi! > > > > Have you tried the SPLIT operator? > > http://pig.apache.org/docs/r0.11.1/basic.html#SPLIT > > After splitting the relation into two separate relations you can STORE > them > > into different locations. > > > > Best Regards, > > Ruslan Al-Fakikh > > https://www.odesk.com/users/~015b7b5f617eb89923 > > > > > > On Sun, Sep 15, 2013 at 11:03 PM, praveenesh kumar <[email protected] > > >wrote: > > > > > Hi, > > > > > > I have a relation A with (customer_id, data). > > > I want to get the unique customer_ids, and spilt them into new > > > files/relations. What is the most efficient way to do that. > > > > > > I can get the distinct customer_ids in a relation. But not able to > > > understand how can can I use it in splitting the data by customer_id. > > > > > > Regards > > > Praveenesh > > > > > >
