[jira] [Commented] (HDFS-3051) A zero-copy ScatterGatherRead api from FSDataInputStream

2016-07-13 Thread Ravikumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374535#comment-15374535
 ] 

Ravikumar commented on HDFS-3051:
-

How about returning the MappedByteBuffers of all blocks for a file in local. If 
there are non-local blocks, this method can simply return empty.

public List readFullyScatterGatherLocal(EnumSet options)
throws IOException {
return ((PositionedReadable)in).readFullyScatterGather(options);
}

A quick sample-impl can be like

public List readFullyScatterGatherLocal(EnumSet) throws 
IOException
{
  List blockRange = getBlockRange(0, getFileLength());
  if(!allBlocksInLocal(blockRange)) 
 {
return;
 }
 List retval = new LinkedList();
 for(LocatedBlock blk:blockRange) 
 {
   blkReader = fetchBlockReader(blk, localDNAddrPair);
  ClientMmap mmap = blkReader.getClientMmap(readOptions);
  mmap.setunmap(false); //Instruction to cache-eviction to avoid unmapping 
this. Slots, streams & all other resources will be closed
  result.add(mmap.getMappedByteBuffer());
  closeBlockReader(blkReader);
}
return retval
}

Apps opening InputStreams only once (Hbase??) can call this method & use the 
zero-copy buffers for reads, if file is local.  If not available, they can fall 
back to regular DFSInputStream. Reads can eliminate sync overheads & get same 
perf as a local filesystem.

But I don't know if "leaking" MappedByteBuffers to calling code can have nasty 
side-effects. 






> A zero-copy ScatterGatherRead api from FSDataInputStream
> 
>
> Key: HDFS-3051
> URL: https://issues.apache.org/jira/browse/HDFS-3051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> It will be nice if we can get a new API from FSDtaInputStream that allows for 
> zero-copy read for hdfs readers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3051) A zero-copy ScatterGatherRead api from FSDataInputStream

2012-04-20 Thread Tim Broberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258432#comment-13258432
 ] 

Tim Broberg commented on HDFS-3051:
---

This interface adds some complexity to the ZeroCopyCompressor interface, 
HADOOP-8148. Debugging traversal of a list of objects across JNI is likely to 
take some work.

Are we approaching any kind of consensus on whether to incorporate this or not?

Also, how large are the individual buffers in these lists, typically?

 A zero-copy ScatterGatherRead api from FSDataInputStream
 

 Key: HDFS-3051
 URL: https://issues.apache.org/jira/browse/HDFS-3051
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 It will be nice if we can get a new API from FSDtaInputStream that allows for 
 zero-copy read for hdfs readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3051) A zero-copy ScatterGatherRead api from FSDataInputStream

2012-03-23 Thread Colin Patrick McCabe (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236848#comment-13236848
 ] 

Colin Patrick McCabe commented on HDFS-3051:


Hi Dhruba,

This sounds interesting.  One thing I don't completely understand about your 
proposed API is whether you will have multiple (position, length) pairs as 
inputs.  Traditionally, scatter-gather implies being able to read multiple 
locations at once, like in ''preadv(2)''.  However, I only see one position, 
length argument in your readFullyScatterGather function.

Also, it seems to me that by mmapping at a fixed address, you could create a 
single contiguous buffer rather than forcing the user to deal with multiple 
buffers for a single HDFS file.

C.

 A zero-copy ScatterGatherRead api from FSDataInputStream
 

 Key: HDFS-3051
 URL: https://issues.apache.org/jira/browse/HDFS-3051
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 It will be nice if we can get a new API from FSDtaInputStream that allows for 
 zero-copy read for hdfs readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3051) A zero-copy ScatterGatherRead api from FSDataInputStream

2012-03-23 Thread Colin Patrick McCabe (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236868#comment-13236868
 ] 

Colin Patrick McCabe commented on HDFS-3051:


Todd said:

bq.  One interesting thing to take note of: in Linux prior to 2.6.37, the page 
fault handler for file mappings actually held the mmap semaphore exclusively, 
preventing other threads from modifying page mappings (or starting threads). So 
doing mmapped IO may have some downsides as well, especially on older kernels. 
Not sure if this issue is addressed in RHEL 6 or not. The Linux git hash is 
d065bd810b6deb67d4897a14bfe21f8eb526ba99, see also 
http://help.lockergnome.com/linux/PATCH-V2-Reduce-mmap_sem-hold-times-file-backed-page-faults--ftopict527005.html

Good point.

At least in theory, you can create threads on Linux without calling mmap.  You 
just can't create pthreads (note the p).  I wonder what HotSpot does exactly 
to create threads?

 A zero-copy ScatterGatherRead api from FSDataInputStream
 

 Key: HDFS-3051
 URL: https://issues.apache.org/jira/browse/HDFS-3051
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 It will be nice if we can get a new API from FSDtaInputStream that allows for 
 zero-copy read for hdfs readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3051) A zero-copy ScatterGatherRead api from FSDataInputStream

2012-03-23 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237185#comment-13237185
 ] 

dhruba borthakur commented on HDFS-3051:


Hi colin, I agree that scatter/gather typically refers to multiple input tuples 
of (position, length). Yes, we can extend the api to include that.

The reason my original proposal did not include that was because I was mostly 
targetting this api to reduce the number of buffer copies. 

 A zero-copy ScatterGatherRead api from FSDataInputStream
 

 Key: HDFS-3051
 URL: https://issues.apache.org/jira/browse/HDFS-3051
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 It will be nice if we can get a new API from FSDtaInputStream that allows for 
 zero-copy read for hdfs readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3051) A zero-copy ScatterGatherRead api from FSDataInputStream

2012-03-06 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13223116#comment-13223116
 ] 

Todd Lipcon commented on HDFS-3051:
---

One interesting thing to take note of: in Linux prior to 2.6.37, the page fault 
handler for file mappings actually held the mmap semaphore exclusively, 
preventing other threads from modifying page mappings (or starting threads). So 
doing mmapped IO may have some downsides as well, especially on older kernels. 
Not sure if this issue is addressed in RHEL 6 or not. The Linux git hash is 
d065bd810b6deb67d4897a14bfe21f8eb526ba99, see also 
http://help.lockergnome.com/linux/PATCH-V2-Reduce-mmap_sem-hold-times-file-backed-page-faults--ftopict527005.html

 A zero-copy ScatterGatherRead api from FSDataInputStream
 

 Key: HDFS-3051
 URL: https://issues.apache.org/jira/browse/HDFS-3051
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 It will be nice if we can get a new API from FSDtaInputStream that allows for 
 zero-copy read for hdfs readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira