Re: file manipulation

Jagat Sat, 02 Jun 2012 21:06:23 -0700

Hi

Alan has already given you background of append in Hadoop.


Another suggestion to merge two files , you can also look at Pig Union

http://pig.apache.org/docs/r0.10.0/basic.html#union

UNION operator to merge the contents of two or more relations
The simple workflow can be

load A
load B
Store union of A and B

Have a look at how Pig Union works
On Sun, Jun 3, 2012 at 8:28 AM, Alan Gates <[email protected]> wrote:

> MapReduce (and hence Pig) does not support file append.  This is because
> in MapReduce tasks may be run multiple times in the case of failure or due
> to speculative execution.  This would result in duplicate appends.  Also,
> if the job fails, it would not be able to remove the appended data.
>
> As far as updating your data, what kind of updates do you want to do?
>  Stores like HBase (which can be accessed from Pig) support updates.  But
> whether this is a good fit depends on your use case.
>
> Alan.
>
> On Jun 1, 2012, at 11:54 AM, Michael G. wrote:
>
> > Hi all
> > I'm new in pig and in hadoop .
> > Can you tell me how I can :
> > 1. append to existing file on HDFS with pig
> > 2. update file  with pig, if it could be passible.
> >
> > 10x.
> >
> > --
> > -- Michael G. --
>
>

Re: file manipulation

Reply via email to