Hi Siddhi,
Please take a look at CSVStorage in trunk:
https://issues.apache.org/jira/browse/PIG-3141.
You can write the header using the WRITE_OUTPUT_HEADER option. Despite its
name, you can also specify a non-comma delimiter. Here is the syntax:
STORE x INTO '<destFileName>'
USING org.apache.pig.piggybank.storage.CSVExcelStorage(
[DELIMITER[,
{YES_MULTILINE | NO_MULTILINE}[,
{UNIX | WINDOWS | NOCHANGE}[,
{READ_INPUT_HEADER, SKIP_INPUT_HEADER,
WRITE_OUTPUT_HEADER, SKIP_OUTPUT_HEADER}]]]]
);
Since this is only in trunk, you need to backport it by yourself to the
version of Pig that you're using.
Thanks,
Cheolsoo
On Tue, Jun 4, 2013 at 8:45 PM, Siddhi Borkar <
[email protected]> wrote:
> I'm writing a pig script similar to:
>
> A = load 'data' using
> org.apache.pig.piggybank.storage.XMLLoader('response') as (line:chararray);
> B = foreach A GENERATE FLATTEN(Parser(line));
> store B into my_data using PigStorage('\t');
>
> This script basically reads a file which contains xml's dumped in it. The
> second line in a pig script calls the java udf which parses the xml.
>
> The Parser UDF returns a data bag with multiple tuples
> This outputs:
>
> (1 91705 rondo music guitar)
> (3 96629 award music guitar)
>
> I'd like to add a header row to the output file:
>
> (Id Form Query)
> (1 91705 rondo music guitar)
> (3 96629 award music guitar)
>
> Any ideas?
>
>
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>