Ah, I see - thanks for clarifying.
john

From: Chris Nauroth [mailto:[email protected]]
Sent: Tuesday, December 17, 2013 4:32 PM
To: [email protected]
Subject: Re: HDFS short-circuit reads

Both of these methods return the same underlying data type that you're 
ultimately interested in.  This is the BlockLocation object, which contains the 
hosts that have a replica of the block.  Depending on your usage pattern, one 
of these methods might be more convenient than the other.

If your application's input is a single file, then you'll likely find that 
getFileBlockLocations is a good fit.  This will give you the BlockLocation 
information for that one file, and you won't need to write extra code to pull 
it out of the RemoteIterator (which you know is only going to contain one 
result anyway).

If your application's input is a whole directory, and you then process all 
files within that directory, then you'll likely find listLocatedStatus to be 
more convenient.  You'll be able to make a single RPC call to get all of the 
BlockLocation information for all files.  (Like you said, one call instead of 
many.)

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Tue, Dec 17, 2013 at 6:39 AM, John Lilley 
<[email protected]<mailto:[email protected]>> wrote:
Thanks!   I do call FileSytem.getFileBlockLocations() now to map tasks to local 
data blocks; is there any advantage to using listLocatedStatus() instead?  I 
guess one call instead of two...
John


From: Chris Nauroth 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Monday, December 16, 2013 6:07 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: HDFS short-circuit reads

Hello John,

Short-circuit reads are not on by default.  The documentation page you linked 
to at hadoop.apache.org<http://hadoop.apache.org/> contains all of the 
information you need to enable them though.

Regarding checking status of short-circuit read programmatically, here are a 
few thoughts on this:

Your application could check Configuration for the dfs.client.read.shortcircuit 
key.  This will tell you at a high level if the feature is enabled.  However, 
note that the feature needs to be turned on in configuration for both the 
DataNode and the HDFS client process.  Depending on the details of the 
deployment, the DataNode and the client might be using different configuration 
files.

This tells you if the feature is enabled, but it doesn't necessarily tell you 
if you're really going to get short-circuit reads when you open the file.  
There might not be a local replica for the block, in which case the read would 
fall back to the typical remote read behavior anyway.

Depending on what your application wants to achieve, you might also be 
interested in looking at the FileSystem.listLocatedStatus API to query 
information about blocks and the corresponding locations of replicas.  
Applications like MapReduce use this information to try to schedule their work 
for optimal locality.  Short-circuit reads then become a further optimization 
on top of the gains already achieved by locality.

Hope this helps,

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Mon, Dec 16, 2013 at 4:21 PM, John Lilley 
<[email protected]<mailto:[email protected]>> wrote:
Our YARN application would benefit from maximal bandwidth on HDFS reads.
But I'm unclear on how short-circuit reads are enabled.
Are they on by default?
Can our application check programmatically to see if the short-circuit read is 
enabled?
Thanks,
john

RE:
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html
https://issues.apache.org/jira/browse/HDFS-347



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

Reply via email to