You may try custom partitioner.
http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#partitionby
https://issues.apache.org/jira/browse/PIG-282.
Daniel
On 03/08/2011 02:04 PM, Dexin Wang wrote:
Unfortunately, it doesn't work.
Seems the same problem as in https://issues.apache.org/jira/browse/PIG-1547
On Tue, Mar 8, 2011 at 1:22 PM, Dexin Wang<[email protected]> wrote:
awesome. Thanks Shawn.
On Tue, Mar 8, 2011 at 12:34 PM, Xiaomeng Wan<[email protected]> wrote:
you can use the multistorage udf in piggybank.
Shawn
On Tue, Mar 8, 2011 at 1:29 PM, Dexin Wang<[email protected]> wrote:
Is there a way to use STORE with variable or some other way to achieve
what
I need.
I have something like this:
grunt> DESCRIBE A;
A: {f1, f2, f3, ...}
grunt> DUMP A;
(v1, x2, x3, ...)
(v2, x4, x5, ...)
(v1, x6, x6, ...)
...
I do so processing and then group by f1 and would like to save the
result in
different directories for different f1, like this:
/result/f1/result_for_v1
/result/f2/result_for_v2
/result/f2/result_for_v2
...
I know I could use SPLIT, but I have 100+ unique values for f1, and
number
of uniques varies each time I process. It will be nice I don't have list
100
BY lines with SPLIT and I certainly do not want to maintain the list of
possible values for f1 in my Pig script.
Thanks!
Dexin