Hello Dongzhe Ma,

 

Yes HDFS employs Single writer, multiple reader model. This means :

 

WRITE

•       HDFS client maintains a lease on files it opened for write (for entire 
file not for block)

•       Only one client can hold a lease on a single file

•       For each block of data, setup a pipeline of Data Nodes to write to.

•       A file written cannot be modified, but can be appended

•       Client periodically renews the lease by sending heartbeats to the 
NameNode

•       Lease Timeout/Expiration:

•       Soft Limit: exclusive access to file, can extend lease

•       Until soft limit expires client has exclusive access to the file

•       After Soft limit, any client can claim the lease

•       Hard Limit: 1 hour - continue to have access unless some other client 
pre-empts it. 

•       Also after hard limit, the file is closed.

READ

•       Get list of Data Nodes from Name Node in topological sorted order. Then 
read data directly from Data Nodes.

•       During read, the checksum is validated and if found different, it is 
reported to Name Node which marks it for deletion.

•       On error while reading a block, next replica from the pipeline is used 
to read it.

 

You can refer below link for creating test cases .  

 

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/

 

Hope this helps!!!

 

Thanks and Regards,
S.RagavendraGanesh

Hadoop Support Team

ViSolve Inc.| <http://www.visolve.com> www.visolve.com

 

 

 

From: Dongzhe Ma [mailto:[email protected]] 
Sent: Monday, February 02, 2015 10:50 AM
To: [email protected]
Subject: About HDFS's single-writer, multiple-reader model, any use case?

 

We know that HDFS employs a single-writer, multiple-reader model, which means 
that there could be only one process writing to a file at the same time, but 
multiple readers can also work in parallel and new readers can even observe the 
new content. The reason for this design is to simplify concurrency control. 
But, is it necessary to support reading during writing? Can anyone bring up 
some use cases? Why not just lock the whole file like other posix file systems 
(in terms of locking granularity)?

Reply via email to