Hi,

Can't we append in hadoop-0.20.203.0?

Regards,
Jagaran       



________________________________
From: Dmitriy Ryaboy <dvrya...@gmail.com>
To: user@pig.apache.org
Sent: Wed, 15 June, 2011 9:57:20 AM
Subject: Re: Running multiple Pig jobs simultaneously on same data

Yong,

You can't. Hence, immutable. It's not a database. It's a write-once file system.

Approaches to solve updates include:
1) rewrite everything
2) write a separate set of "deltas" into other files and join them in
at read time
3) do 2, and occasionally run a "compaction" which does a complete
rewrite based on existing deltas
4) write to something like HBase that handles all of this under the covers

D

2011/6/15 勇胡 <yongyong...@gmail.com>:
> Jon,
>
> If I want to modify data(insert or delete) in the HDFS, how can I do it?
> From the description, I can not directly modify the data itself(update the
> data), I can not append the new data to the file! How the HDFS implement the
> data modification? I just feel a little bit confusion.
>
> Yong
> 在 2011年6月15日 下午3:36,Jonathan Coveney <jcove...@gmail.com>写道:
>
>> Yong,
>>
>> Currently, HDFS does not support appending to a file. So once a file is
>> created, it literally cannot be changed (although it can be deleted, I
>> suppose). this lets you avoid issues where I do a SELECT * on the entire
>> database, and the dba can't update a row, or other things like that. There
>> are some append patches in the works but I am not sure how they handle the
>> concurrency implications.
>>
>> Make sense?
>> Jon
>>
>> 2011/6/15 勇胡 <yongyong...@gmail.com>
>>
>> > I read the link, and I just felt that the HDFS is designed for the
>> > read-frequently operation, not for the write-frequently( A file
>> > once created, written, and closed need not be changed.) .
>> >
>> > For your description (Immutable means that after creation it cannot be
>> > modified.), if I understand correct, you mean that the HDFS can not
>> > implement "update" semantics as same as in the database area? The write
>> > operation can not directly apply to the specific tuple or record? The
>> > result
>> > of write operation just appends at the end of the file.
>> >
>> > Regards
>> >
>> > Yong
>> >
>> > 2011/6/15 Nathan Bijnens <nat...@nathan.gs>
>> >
>> > > Immutable means that after creation it cannot be modified.
>> > >
>> > > HDFS applications need a write-once-read-many access model for files. A
>> > > file
>> > > once created, written, and closed need not be changed. This assumption
>> > > simplifies data coherency issues and enables high throughput data
>> access.
>> > A
>> > > MapReduce application or a web crawler application fits perfectly with
>> > this
>> > > model. There is a plan to support appending-writes to files in the
>> > future.
>> > >
>> > >
>> >
>>http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#Simple+Coherency+Model
>>l
>> > >
>> > > Best regards,
>> > >  Nathan
>> > > ---
>> > > nat...@nathan.gs : http://nathan.gs : http://twitter.com/nathan_gs
>> > >
>> > >
>> > > On Wed, Jun 15, 2011 at 12:58 PM, 勇胡 <yongyong...@gmail.com> wrote:
>> > >
>> > > > How can I understand immutable? I mean whether the HDFS implements
>> lock
>> > > > mechanism to obtain immutable data access when the concurrent tasks
>> > > process
>> > > > the same set of data or uses other strategy to implement immutable?
>> > > >
>> > > > Thanks
>> > > >
>> > > > Yong
>> > > >
>> > > > 2011/6/14 Bill Graham <billgra...@gmail.com>
>> > > >
>> > > > > Yes, this is possible. Data in HDFS is immutable and MR tasks are
>> > > spawned
>> > > > > in
>> > > > > their own VM so multiple concurrent jobs acting on the same input
>> > data
>> > > > are
>> > > > > fine.
>> > > > >
>> > > > > On Tue, Jun 14, 2011 at 11:18 AM, Pradipta Kumar Dutta <
>> > > > > pradipta.du...@me.com> wrote:
>> > > > >
>> > > > > > Hi All,
>> > > > > >
>> > > > > > We have a requirement where we have to process same set of data
>> (in
>> > > > > Hadoop
>> > > > > > cluster) by running multiple Pig jobs simultaneously.
>> > > > > >
>> > > > > > Any idea whether this is possible in Pig?
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Pradipta
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to