Thanks. getmerge save the output file locally. So had to move again back to
hdfs directory.  Is there a way to append datetime to the final output file.
?

fs -cp $OUTPUT/.pig_header  $OUTPUT/part* $OUTPUT/tmp
fs -getmerge $OUTPUT/tmp top10advertisers.csv
fs -rm results/top10advertisers.csv
fs -moveFromLocal top10advertisers.csv results


On 26 May 2011 20:07, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> Try -getmerge
>
> D
>
> On Thu, May 26, 2011 at 11:20 AM, Subhramanian, Deepak
> <deepak.subhraman...@newsint.co.uk> wrote:
> > I tried using the fs -cat and sh -cat function to combine the header and
> > output file to a new file . But it is not working.  Does hadoop give an
> > option to combine two files  to a new file in pig script.
> >
> > This is the command I used at the end of the pig script.
> >
> > STORE out3 INTO '$OUTPUT' USING
> > org.apache.pig.piggybank.storage.PigStorageSchema();
> >
> > sh -cat $OUTPUT/.pig_header  $OUTPUT/part* > $OUTPUT/top10adv.csv
> >
> >
> > hadoop fs -ls pigdbck/output/top10advperimpfileh5
> > Found 4 items
> > -rw-r--r--   1 root supergroup         30 2011-05-26 17:52
> > /user/root/pigdbck/output/top10advperimpfileh5/.pig_header
> > -rw-r--r--   1 root supergroup        361 2011-05-26 17:52
> > /user/root/pigdbck/output/top10advperimpfileh5/.pig_schema
> > drwxr-xr-x   - root supergroup          0 2011-05-26 17:51
> > /user/root/pigdbck/output/top10advperimpfileh5/_logs
> > -rw-r--r--   1 root supergroup        117 2011-05-26 17:52
> > /user/root/pigdbck/output/top10advperimpfileh5/part-r-00000
> >
> >
> > On 26 May 2011 12:02, Subhramanian, Deepak <
> > deepak.subhraman...@newsint.co.uk> wrote:
> >
> >> I thought any java class extension was a UDF. Thanks Dmitriy for
> >> clarifying. Yes. I meant extending the StoreFunce. I guess I will use
> the
> >> PigStorageSchema for the time being as I am tight on my deadlines. And
> use
> >> the cat to concatenate the header. I didnt realized that we can use the
> cat
> >> directly in the pig script and that is why thought of extending the
> >> StoreFunc.  Thanks Alan for your inputs.
> >>
> >>  I will have to read more on how the output part files are created on
> hdfs
> >> so that I can combine all the part files at the end of the pig script
> for a
> >> final output  if the file size is very big.
> >>
> >>
> >> On 25 May 2011 21:22, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:
> >>
> >>> Still not clear on how you expect a UDF to help.. normally when we say
> >>> UDFs, we mean functions work on individual tuples. They don't have
> >>> anything to do with how you store data.
> >>>
> >>> You probably mean StoreFunc; since in this case you want a StoreFunc
> >>> that messes with the file format, as opposed to writing a side file
> >>> like PigStorageSchema does, you'll need to go pretty deep -- write a
> >>> whole StoreFunc + OutputFormat + RecordWriter stack.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, May 25, 2011 at 12:51 PM, Subhramanian, Deepak
> >>> <deepak.subhraman...@newsint.co.uk> wrote:
> >>> > Thanks for the inputs. I am looking for a UDF which I can use to
> store
> >>> the
> >>> > headers in the pig output file.
> >>> >
> >>> > On 25 May 2011 18:30, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:
> >>> >
> >>> >> Can you explain what UDF you are looking for?
> >>> >> The intended usage for the .pig_header file is to cat it:
> >>> >>
> >>> >> hadoop fs -cat myresults/.pig_header myresults/part*
> >>> >>
> >>> >> (which drops the header right on top of your data).
> >>> >>
> >>> >> We don't want to put the header inside the data files because that
> can
> >>> >> break subsequent processing.
> >>> >>
> >>> >> As for names of the fields, that's a pig feature, it's there for
> >>> >> disambiguation. If you don't like it, you can rename the fields:
> >>> >> FLATTEN(aggregated) as (advertiserId, Advertiser, OrderId, ....)
> >>> >>
> >>> >>
> >>> >>
> >>> >> D
> >>> >>
> >>> >> On Wed, May 25, 2011 at 9:00 AM, Subhramanian, Deepak
> >>> >> <deepak.subhraman...@newsint.co.uk> wrote:
> >>> >> > Hi , I just realized that it is creating .pig_header file in the
> same
> >>> >> output
> >>> >> > directory. I guess I need to create a new UDF. Also if I am
> grouping
> >>> it
> >>> >> is
> >>> >> > appending the tag aggregated::group: to the header column. Is
> Flatten
> >>> is
> >>> >> not
> >>> >> > suppose to remove the group ?
> >>> >> >
> >>> >> >  cat .pig_header
> >>> >> > aggregated::group::AdvertiserID null::Advertiser
> >>> >> >  aggregated::group::OrderID      aggregated::group::AdID
> >>> >> > aggregated::group::CreativeID   aggregated::group::CreativeVersion
> >>> >> > aggregated::group::CreativeSizeID       aggregated::group::SiteID
> >>> >> > aggregated::group::PageID       aggregated::group::Keyword
> >>> >> >  aggregated::Impressions
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > On 25 May 2011 16:48, Subhramanian, Deepak <
> >>> >> > deepak.subhraman...@newsint.co.uk> wrote:
> >>> >> >
> >>> >> >> I tried the PigStorageSchema. For some reason it doesnt create
> the
> >>> >> headers.
> >>> >> >> Is it because I am loading the data using another UDF ?
> >>> >> >>
> >>> >> >> This is the command I used in the pigscript..
> >>> >> >>
> >>> >> >> STORE out INTO '$OUTPUT' USING
> >>> >> >> org.apache.pig.piggybank.storage.PigStorageSchema();
> >>> >> >>
> >>> >> >> Thanks, Deepak
> >>> >> >>
> >>> >> >>
> >>> >> >> On 25 May 2011 16:13, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:
> >>> >> >>
> >>> >> >>> You can try PigStorageSchema from the piggybank.
> >>> >> >>>
> >>> >> >>> -----Original Message-----
> >>> >> >>> From: "Subhramanian, Deepak" <deepak.subhraman...@newsint.co.uk
> >
> >>> >> >>> To: user@pig.apache.org
> >>> >> >>> Sent: 5/25/2011 5:28 AM
> >>> >> >>> Subject: Storing Headers in Pig Output File
> >>> >> >>>
> >>> >> >>> Is there a way to store the headers (titles of each) column
> using
> >>> the
> >>> >> >>> Store
> >>> >> >>> command in Pig Script  (STORE out3 INTO '$OUTPUT' USING
> >>> PigStorage();.
> >>> >> >>> Right
> >>> >> >>> now it stores only the data. Somewhere I read in Pig0.8 it
> stores
> >>> the
> >>> >> >>> header
> >>> >> >>> with map reduce option. Do we have to supply extra parameters ?
> >>> >> >>>
> >>> >> >>> Thanks, Deepak
> >>> >> >>>
> >>> >
> >>>
> >>
> >>
> >
> > --
> > "Please consider the environment before printing this e-mail"
> >
> > The Newspaper Marketing Agency: Opening Up Newspapers:
> > www.nmauk.co.uk
> >
> > This e-mail and any attachments are confidential, may be legally
> privileged and are the property of
> > News International Limited (which is the holding company for the News
> International group, is
> > registered in England under number 81701 and whose registered office is 3
> Thomas More Square,
> > London E98 1XY, VAT number GB 243 8054 69), on whose systems they were
> generated.
> >
> > If you have received this e-mail in error, please notify the sender
> immediately and do not use,
> > distribute, store or copy it in any way. Statements or opinions in this
> e-mail or any attachment are
> > those of the author and are not necessarily agreed or authorised by News
> International Limited or
> > any member of its group. News International Limited may monitor outgoing
> or incoming emails as
> > permitted by law. It accepts no liability for viruses introduced by this
> e-mail or attachments.
> >
>



-- 
Deepak Subhramanian
Data & Analytics
News International, Digital Technology
Email: deepak.subhraman...@newsint.co.uk

-- 
"Please consider the environment before printing this e-mail" 

The Newspaper Marketing Agency: Opening Up Newspapers: 
www.nmauk.co.uk 

This e-mail and any attachments are confidential, may be legally privileged and 
are the property of 
News International Limited (which is the holding company for the News 
International group, is 
registered in England under number 81701 and whose registered office is 3 
Thomas More Square, 
London E98 1XY, VAT number GB 243 8054 69), on whose systems they were 
generated. 

If you have received this e-mail in error, please notify the sender immediately 
and do not use, 
distribute, store or copy it in any way. Statements or opinions in this e-mail 
or any attachment are 
those of the author and are not necessarily agreed or authorised by News 
International Limited or  
any member of its group. News International Limited may monitor outgoing or 
incoming emails as 
permitted by law. It accepts no liability for viruses introduced by this e-mail 
or attachments.

Reply via email to