I'm not aware of any native PIG commands that can do this. So you'll have
to implement a UDF to do this. My implementation would look as follows:
A = load 'data' as (id: int, seg_num: int, text: chararray);
B = group A by id;
C = foreach B {
D = order A by seg_num; -- assuming that data is not sorted by seg_num
generate id, CONCAT_UDF(D);
};
dump C;
Within the CONCAT_UDF implementation, you have a DataBag as input whose
tuples are sorted by seg_num, so you can use a StringBuilder to concat the
strings together and return the resulting string.
Hope this helps.
On Sun, Jul 14, 2013 at 10:39 AM, Shahab Yunus <[email protected]>wrote:
> At least I am not aware of a PIG command which can do this. You can start
> by grouping on 'id', and then try flattening the 'text' field. But then
> you run into the issue that you have lost the sorting order ('seg_no')
> which is required to construct a meaningful sentence. Here I think you need
> UDF where you pass both 'seq_no' and 'text' and do the work.
>
> I can think of doing some convoluted processing like concatenating the
> 'seg_no' and 'text' fields as one and then grouping on 'id' and then
> sorting on the new concatenated field within the group. But then once,
> you've done that, you will have to split back the combined field again. And
> doing all this might not help either. The main thing here is that, as far
> as I know, you cannot impose sort order in a bag or while flattening a
> group in one row. I would be interested to know if this is possible through
> native Pig.
>
> Regards,
> Shahab
>
>
> On Sat, Jul 13, 2013 at 9:45 PM, Karthik Natarajan <
> [email protected]> wrote:
>
> > Hi,
> >
> > I'm new to Pig. I have a file that contains the contents of documents.
> The
> > problem is that the contents are not in one line of the file. The file is
> > actually an export of a database table. Below is an example of the table:
> >
> > id seg_no text
> > -- ----- -----
> > 1 0 This is
> > 1 1 a
> > 1 2 test for
> > 1 3 Hello
> > 1 4 World!
> > 2 0 Test
> > 2 1 number
> > 2 2 two.
> >
> >
> > How do I get an output like this:
> >
> > id text
> > -- ----
> > 1 This is a test for Hello World!
> > 2 Test number two.
> >
> >
> > I can do this in SQL, but I want to try it using Hadoop and Pig. I'm not
> > sure how to concatenate values of a column w/in a group. I wondering if
> > Pig's built-in functions can handle this or if I have to create a UDF.
> I'm
> > thinking I need to create a UDF, but am not sure how to go about this.
> Any
> > help/advice would be appreciated.
> >
> > Thanks.
> >
>