Hi, I'm new to Pig. I have a file that contains the contents of documents. The problem is that the contents are not in one line of the file. The file is actually an export of a database table. Below is an example of the table:
id seg_no text -- ----- ----- 1 0 This is 1 1 a 1 2 test for 1 3 Hello 1 4 World! 2 0 Test 2 1 number 2 2 two. How do I get an output like this: id text -- ---- 1 This is a test for Hello World! 2 Test number two. I can do this in SQL, but I want to try it using Hadoop and Pig. I'm not sure how to concatenate values of a column w/in a group. I wondering if Pig's built-in functions can handle this or if I have to create a UDF. I'm thinking I need to create a UDF, but am not sure how to go about this. Any help/advice would be appreciated. Thanks.
