Jon, If I want to modify data(insert or delete) in the HDFS, how can I do it? >From the description, I can not directly modify the data itself(update the data), I can not append the new data to the file! How the HDFS implement the data modification? I just feel a little bit confusion.
Yong 在 2011年6月15日 下午3:36,Jonathan Coveney <jcove...@gmail.com>写道: > Yong, > > Currently, HDFS does not support appending to a file. So once a file is > created, it literally cannot be changed (although it can be deleted, I > suppose). this lets you avoid issues where I do a SELECT * on the entire > database, and the dba can't update a row, or other things like that. There > are some append patches in the works but I am not sure how they handle the > concurrency implications. > > Make sense? > Jon > > 2011/6/15 勇胡 <yongyong...@gmail.com> > > > I read the link, and I just felt that the HDFS is designed for the > > read-frequently operation, not for the write-frequently( A file > > once created, written, and closed need not be changed.) . > > > > For your description (Immutable means that after creation it cannot be > > modified.), if I understand correct, you mean that the HDFS can not > > implement "update" semantics as same as in the database area? The write > > operation can not directly apply to the specific tuple or record? The > > result > > of write operation just appends at the end of the file. > > > > Regards > > > > Yong > > > > 2011/6/15 Nathan Bijnens <nat...@nathan.gs> > > > > > Immutable means that after creation it cannot be modified. > > > > > > HDFS applications need a write-once-read-many access model for files. A > > > file > > > once created, written, and closed need not be changed. This assumption > > > simplifies data coherency issues and enables high throughput data > access. > > A > > > MapReduce application or a web crawler application fits perfectly with > > this > > > model. There is a plan to support appending-writes to files in the > > future. > > > > > > > > > http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#Simple+Coherency+Model > > > > > > Best regards, > > > Nathan > > > --- > > > nat...@nathan.gs : http://nathan.gs : http://twitter.com/nathan_gs > > > > > > > > > On Wed, Jun 15, 2011 at 12:58 PM, 勇胡 <yongyong...@gmail.com> wrote: > > > > > > > How can I understand immutable? I mean whether the HDFS implements > lock > > > > mechanism to obtain immutable data access when the concurrent tasks > > > process > > > > the same set of data or uses other strategy to implement immutable? > > > > > > > > Thanks > > > > > > > > Yong > > > > > > > > 2011/6/14 Bill Graham <billgra...@gmail.com> > > > > > > > > > Yes, this is possible. Data in HDFS is immutable and MR tasks are > > > spawned > > > > > in > > > > > their own VM so multiple concurrent jobs acting on the same input > > data > > > > are > > > > > fine. > > > > > > > > > > On Tue, Jun 14, 2011 at 11:18 AM, Pradipta Kumar Dutta < > > > > > pradipta.du...@me.com> wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > We have a requirement where we have to process same set of data > (in > > > > > Hadoop > > > > > > cluster) by running multiple Pig jobs simultaneously. > > > > > > > > > > > > Any idea whether this is possible in Pig? > > > > > > > > > > > > Thanks, > > > > > > Pradipta > > > > > > > > > > > > > > > > > > > > >