Sorry. I didn't know/understand that you had unknown values. Yes, in your case MultiStorage is a good way to split the data according to the values of a column. It worked for me in similar cases.
Thanks On Mon, Sep 16, 2013 at 4:06 AM, praveenesh kumar <[email protected]>wrote: > Okay, I might not be able to explain the right scenario. Apologize if I was > not clear enough with my problem. > > My scenario - > > I have a relation A, that has unique number of (unknown) customer_ids. I > want to create different (N) number of output files per customer_id. I was > thinking of finding the unique customer_ids first and then I was confused > on how to go ahead, which made me to post the question. > > Through some further googling, I found piggybank's MultiStorage UDF that > does this kind of operation, which in my case would do the job. > Anyways, I was just thinking, if I had to do some other operation, eg > filtering by unique customer ids, how would you achieve that in pig. > > SPLIT would need some known criteria to split into relations. Please > correct me if I am wrong there. When values are unknown, how can we achieve > the same. > > Regards > Praveenesh > > > On Mon, Sep 16, 2013 at 12:44 AM, Shahab Yunus <[email protected] > >wrote: > > > Correction in my earlier comment. The following statement that I wrote > was > > wrong: > > 'Won't SPLIT always give you 2 relations?' > > > > It is basically what Praveenesh himself mentioned i.e. a > pre-defined/known > > number of relations/splits. > > > > Regards, > > Shahab > > > > > > On Sun, Sep 15, 2013 at 7:41 PM, praveenesh kumar <[email protected] > > >wrote: > > > > > I can use split only when I am aware of the values by which I need to > > split > > > by... Here customer_ids are unknown to me. I don't know how many of > them > > > exist in my data. Hence SPLIT is not the answer to my problem. > > > > > > Anyways I have found piggybank's MultiStorage method much closer to > what > > I > > > am looking for. I was just wondering is there a better or different way > > to > > > do the same. > > > > > > Regards > > > Praveenesh > > > > > > > > > On Mon, Sep 16, 2013 at 12:36 AM, Ruslan Al-Fakikh < > [email protected] > > > >wrote: > > > > > > > Hi! > > > > > > > > Have you tried the SPLIT operator? > > > > http://pig.apache.org/docs/r0.11.1/basic.html#SPLIT > > > > After splitting the relation into two separate relations you can > STORE > > > them > > > > into different locations. > > > > > > > > Best Regards, > > > > Ruslan Al-Fakikh > > > > https://www.odesk.com/users/~015b7b5f617eb89923 > > > > > > > > > > > > On Sun, Sep 15, 2013 at 11:03 PM, praveenesh kumar < > > [email protected] > > > > >wrote: > > > > > > > > > Hi, > > > > > > > > > > I have a relation A with (customer_id, data). > > > > > I want to get the unique customer_ids, and spilt them into new > > > > > files/relations. What is the most efficient way to do that. > > > > > > > > > > I can get the distinct customer_ids in a relation. But not able to > > > > > understand how can can I use it in splitting the data by > customer_id. > > > > > > > > > > Regards > > > > > Praveenesh > > > > > > > > > > > > > > >
