[jira] [Commented] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"

2019-11-07 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969719#comment-16969719
 ] 

Duo Zhang commented on HDFS-13613:
--

+1 on removing the log. I believe if this happens, it will always flood the log 
file, as this means we are overloaded, and flood the log file with this message 
will make things even worse...

A better way is to implement a counter for this event, and log periodically 
about this event with some numbers. Can do this in a follow on issue?

> RegionServer log is flooded with "Execution rejected, Executing in current 
> thread"
> --
>
> Key: HDFS-13613
> URL: https://issues.apache.org/jira/browse/HDFS-13613
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read
>Reporter: Wei-Chiu Chuang
>Priority: Major
> Attachments: 
> 0001-HDFS-13613-RegionServer-log-is-flooded-with-Executio.patch
>
>
> In the log of a HBase RegionServer with hedged read, we saw the following 
> message flooding the log file.
> {noformat}
> 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 
> {noformat}
> Sometimes the RS spits tens of thousands of lines of this message in a 
> minute. We should do something to stop this message flooding the log file. 
> Also, we should make this message more actionable. Discussed with 
> [~huaxiang], this message can appear if there are stale DataNodes.
> I believe this issue existed since HDFS-5776.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14648) DeadNodeDetector basic model

2019-08-23 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914760#comment-16914760
 ] 

Duo Zhang commented on HDFS-14648:
--

We have been using this in our production for a long time. It does solve a big 
problem for us in HBase.

You know, for performance, HBase will open a dfs input stream and never close 
it unless the file has been compacted away. So when a DN is broken, every dfs 
input stream needs to find out the broken DN by its own, since every dfs input 
stream manages its own live/dead nodes.

If it is just a process crash, HBase will be fine, since when we touch the dead 
DN, we will receive a connection refused immediately and then go to other DNs. 
But if the machine is completely down, we will hang there for a long time and 
finally receive a connection timeout. Usually the connection timeout will be a 
value which is a bit large(15 seconds in our deploy), as there is no way to set 
the value per request so we have to find a value which is greater than most of 
the timeout values from HBase requests. 

This is really a big problem for us. For a 300+ nodes cluster, a machine 
failure will make the availability down for more than 2 hours!

So I think this is really a useful feature for HBase.

Thanks.

> DeadNodeDetector basic model
> 
>
> Key: HDFS-14648
> URL: https://issues.apache.org/jira/browse/HDFS-14648
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14648.001.patch, HDFS-14648.002.patch, 
> HDFS-14648.003.patch, HDFS-14648.004.patch
>
>
> This Jira constructs DeadNodeDetector state machine model. The function it 
> implements as follow:
>  # After DFSInputstream detects some DataNode die, it put in DeadNodeDetector 
> and share this information to others in the same DFSClient. The ohter 
> DFSInputstreams will not read this DataNode.
>  # DeadNodeDetector also have DFSInputstream reference relationships to each 
> DataNode. When DFSInputstream close, DeadNodeDetector also remove this 
> reference. If some DeadNode of DeadNodeDetector is not read by 
> DFSInputstream, it also is removed from DeadNodeDetector.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14541) When evictableMmapped or evictable size is zero, do not throw NoSuchElementException

2019-06-24 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871916#comment-16871916
 ] 

Duo Zhang commented on HDFS-14541:
--

The default option is what you selected last time, so after you click the 
'squash and merge' once it will be the default for you in the future :)

>  When evictableMmapped or evictable size is zero, do not throw 
> NoSuchElementException
> -
>
> Key: HDFS-14541
> URL: https://issues.apache.org/jira/browse/HDFS-14541
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, performance
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14541.000.patch, HDFS-14541.001.patch, 
> HDFS-14541.002.patch, after-QPS.png, after-cpu-flame-graph.svg, 
> after-heap-flame-graph.svg, async-prof-pid-94152-alloc-2.svg, 
> async-prof-pid-94152-cpu-1.svg, before-QPS.png, before-cpu-flame-graph.svg, 
> before-heap-flame-graph.svg
>
>
> Our XiaoMi HBase team are evaluating the performence improvement of 
> HBASE-21879,  and we have few CPU flame graph  & heap flame graph by using 
> async-profiler,  and find that there're some performence issues in DFSClient  
> . 
> See the attached two flame graphs, we can conclude that the try catch block 
> in ShortCircuitCache#trimEvictionMaps  has some serious perf problem , we 
> should remove the try catch from DFSClient. 
> {code}
>   /**
>* Trim the eviction lists.
>*/
>   private void trimEvictionMaps() {
> long now = Time.monotonicNow();
> demoteOldEvictableMmaped(now);
> while (true) {
>   long evictableSize = evictable.size();
>   long evictableMmappedSize = evictableMmapped.size();
>   if (evictableSize + evictableMmappedSize <= maxTotalSize) {
> return;
>   }
>   ShortCircuitReplica replica;
>   try {
> if (evictableSize == 0) {
>   replica = (ShortCircuitReplica)evictableMmapped.get(evictableMmapped
>   .firstKey());
> } else {
>   replica = (ShortCircuitReplica)evictable.get(evictable.firstKey());
> }
>   } catch (NoSuchElementException e) {
> break;
>   }
>   if (LOG.isTraceEnabled()) {
> LOG.trace(this + ": trimEvictionMaps is purging " + replica +
> StringUtils.getStackTrace(Thread.currentThread()));
>   }
>   purge(replica);
> }
>   }
> {code}
> Our Xiaomi HDFS Team member [~leosun08] will prepare patch for this issue.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14541) When evictableMmapped or evictable size is zero, do not throw NoSuchElementException

2019-06-24 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-14541:
-
Component/s: performance
 hdfs-client

>  When evictableMmapped or evictable size is zero, do not throw 
> NoSuchElementException
> -
>
> Key: HDFS-14541
> URL: https://issues.apache.org/jira/browse/HDFS-14541
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, performance
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14541.000.patch, HDFS-14541.001.patch, 
> HDFS-14541.002.patch, after-QPS.png, after-cpu-flame-graph.svg, 
> after-heap-flame-graph.svg, async-prof-pid-94152-alloc-2.svg, 
> async-prof-pid-94152-cpu-1.svg, before-QPS.png, before-cpu-flame-graph.svg, 
> before-heap-flame-graph.svg
>
>
> Our XiaoMi HBase team are evaluating the performence improvement of 
> HBASE-21879,  and we have few CPU flame graph  & heap flame graph by using 
> async-profiler,  and find that there're some performence issues in DFSClient  
> . 
> See the attached two flame graphs, we can conclude that the try catch block 
> in ShortCircuitCache#trimEvictionMaps  has some serious perf problem , we 
> should remove the try catch from DFSClient. 
> {code}
>   /**
>* Trim the eviction lists.
>*/
>   private void trimEvictionMaps() {
> long now = Time.monotonicNow();
> demoteOldEvictableMmaped(now);
> while (true) {
>   long evictableSize = evictable.size();
>   long evictableMmappedSize = evictableMmapped.size();
>   if (evictableSize + evictableMmappedSize <= maxTotalSize) {
> return;
>   }
>   ShortCircuitReplica replica;
>   try {
> if (evictableSize == 0) {
>   replica = (ShortCircuitReplica)evictableMmapped.get(evictableMmapped
>   .firstKey());
> } else {
>   replica = (ShortCircuitReplica)evictable.get(evictable.firstKey());
> }
>   } catch (NoSuchElementException e) {
> break;
>   }
>   if (LOG.isTraceEnabled()) {
> LOG.trace(this + ": trimEvictionMaps is purging " + replica +
> StringUtils.getStackTrace(Thread.currentThread()));
>   }
>   purge(replica);
> }
>   }
> {code}
> Our Xiaomi HDFS Team member [~leosun08] will prepare patch for this issue.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14541) When evictableMmapped or evictable size is zero, do not throw NoSuchElementException

2019-06-24 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871911#comment-16871911
 ] 

Duo Zhang commented on HDFS-14541:
--

You should use 'squash and merge' or 'rebase and merge', instead of 'create a 
merge commit'. And I think we should file an issue at INFRA to disable the 
'create a merge commit' button...

And I think should go into all branches effected? Not only trunk?

Thanks [~jojochuang] [~elgoiri].

>  When evictableMmapped or evictable size is zero, do not throw 
> NoSuchElementException
> -
>
> Key: HDFS-14541
> URL: https://issues.apache.org/jira/browse/HDFS-14541
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14541.000.patch, HDFS-14541.001.patch, 
> HDFS-14541.002.patch, after-QPS.png, after-cpu-flame-graph.svg, 
> after-heap-flame-graph.svg, async-prof-pid-94152-alloc-2.svg, 
> async-prof-pid-94152-cpu-1.svg, before-QPS.png, before-cpu-flame-graph.svg, 
> before-heap-flame-graph.svg
>
>
> Our XiaoMi HBase team are evaluating the performence improvement of 
> HBASE-21879,  and we have few CPU flame graph  & heap flame graph by using 
> async-profiler,  and find that there're some performence issues in DFSClient  
> . 
> See the attached two flame graphs, we can conclude that the try catch block 
> in ShortCircuitCache#trimEvictionMaps  has some serious perf problem , we 
> should remove the try catch from DFSClient. 
> {code}
>   /**
>* Trim the eviction lists.
>*/
>   private void trimEvictionMaps() {
> long now = Time.monotonicNow();
> demoteOldEvictableMmaped(now);
> while (true) {
>   long evictableSize = evictable.size();
>   long evictableMmappedSize = evictableMmapped.size();
>   if (evictableSize + evictableMmappedSize <= maxTotalSize) {
> return;
>   }
>   ShortCircuitReplica replica;
>   try {
> if (evictableSize == 0) {
>   replica = (ShortCircuitReplica)evictableMmapped.get(evictableMmapped
>   .firstKey());
> } else {
>   replica = (ShortCircuitReplica)evictable.get(evictable.firstKey());
> }
>   } catch (NoSuchElementException e) {
> break;
>   }
>   if (LOG.isTraceEnabled()) {
> LOG.trace(this + ": trimEvictionMaps is purging " + replica +
> StringUtils.getStackTrace(Thread.currentThread()));
>   }
>   purge(replica);
> }
>   }
> {code}
> Our Xiaomi HDFS Team member [~leosun08] will prepare patch for this issue.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14535) The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is causing lots of heap allocation in HBase when using short-circut read

2019-06-13 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863656#comment-16863656
 ] 

Duo Zhang commented on HDFS-14535:
--

Agree. I think this is not a big new feature so we should include it in all 
active branches?

> The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is 
> causing lots of heap allocation in HBase when using short-circut read
> --
>
> Key: HDFS-14535
> URL: https://issues.apache.org/jira/browse/HDFS-14535
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14535.patch
>
>
> Our HBase team are trying to read the blocks from HDFS into pooled offheap 
> ByteBuffers directly (HBASE-21879),  and recently we had some benchmark, 
> found that almost 45% heap allocation from the DFS client.   The heap 
> allocation flame graph can be see here: 
> https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg
> After checking the code path,  we found that when requesting file descriptors 
> from a DomainPeer,  we allocated huge 8KB buffer for BufferedOutputStream, 
> though the protocal content was quite small and just few bytes.
> It made a heavy GC pressure for HBase when cacheHitRatio < 60%,  which 
> increased the HBase P999 latency.  Actually,  we can pre-allocate a small 
> buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read 
> the short-circuit fd protocal content.  we've created a patch like that, and 
> the allocation flame graph show that  after the patch, the heap allocation 
> from DFS client dropped from 45% to 27%, that's a very good thing  I think.  
> see: 
> https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg
> Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x,  
> HBase will benifit a lot from this. 
> Thanks. 
> For more details, can see here: 
> https://issues.apache.org/jira/browse/HBASE-22387?focusedCommentId=16851639=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16851639



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14535) The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is causing lots of heap allocation in HBase when using short-circut read

2019-06-13 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-14535:
-
Component/s: hdfs-client

> The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is 
> causing lots of heap allocation in HBase when using short-circut read
> --
>
> Key: HDFS-14535
> URL: https://issues.apache.org/jira/browse/HDFS-14535
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14535.patch
>
>
> Our HBase team are trying to read the blocks from HDFS into pooled offheap 
> ByteBuffers directly (HBASE-21879),  and recently we had some benchmark, 
> found that almost 45% heap allocation from the DFS client.   The heap 
> allocation flame graph can be see here: 
> https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg
> After checking the code path,  we found that when requesting file descriptors 
> from a DomainPeer,  we allocated huge 8KB buffer for BufferedOutputStream, 
> though the protocal content was quite small and just few bytes.
> It made a heavy GC pressure for HBase when cacheHitRatio < 60%,  which 
> increased the HBase P999 latency.  Actually,  we can pre-allocate a small 
> buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read 
> the short-circuit fd protocal content.  we've created a patch like that, and 
> the allocation flame graph show that  after the patch, the heap allocation 
> from DFS client dropped from 45% to 27%, that's a very good thing  I think.  
> see: 
> https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg
> Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x,  
> HBase will benifit a lot from this. 
> Thanks. 
> For more details, can see here: 
> https://issues.apache.org/jira/browse/HBASE-22387?focusedCommentId=16851639=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16851639



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13643) Implement basic async rpc client

2018-07-20 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-13643:
-
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch HDFS-13572.

Thanks all for reviewing.

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643-v1.patch, HDFS-13643-v2.patch, 
> HDFS-13643-v2.patch, HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13756) Implement the FileSystemLinkResolver in an async way

2018-07-20 Thread Duo Zhang (JIRA)
Duo Zhang created HDFS-13756:


 Summary: Implement the FileSystemLinkResolver in an async way
 Key: HDFS-13756
 URL: https://issues.apache.org/jira/browse/HDFS-13756
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13755) Add retry support for async rpc

2018-07-20 Thread Duo Zhang (JIRA)
Duo Zhang created HDFS-13755:


 Summary: Add retry support for async rpc
 Key: HDFS-13755
 URL: https://issues.apache.org/jira/browse/HDFS-13755
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13754) Add authentication support for async rpc

2018-07-20 Thread Duo Zhang (JIRA)
Duo Zhang created HDFS-13754:


 Summary: Add authentication support for async rpc
 Key: HDFS-13754
 URL: https://issues.apache.org/jira/browse/HDFS-13754
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13753) Implement the mkdir API for AsyncDistributedFileSystem

2018-07-20 Thread Duo Zhang (JIRA)
Duo Zhang created HDFS-13753:


 Summary: Implement the mkdir API for AsyncDistributedFileSystem
 Key: HDFS-13753
 URL: https://issues.apache.org/jira/browse/HDFS-13753
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang


As it only need to connect NN, and easy to write a UT to verify it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13643) Implement basic async rpc client

2018-07-20 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550625#comment-16550625
 ] 

Duo Zhang commented on HDFS-13643:
--

The checkstyle issues are gone? A bit strange...

Anyway, let me commit the patch to branch HDFS-13572 so that we can start the 
follow on work.

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643-v1.patch, HDFS-13643-v2.patch, 
> HDFS-13643-v2.patch, HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13643) Implement basic async rpc client

2018-07-20 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-13643:
-
Attachment: HDFS-13643-v2.patch

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643-v1.patch, HDFS-13643-v2.patch, 
> HDFS-13643-v2.patch, HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13643) Implement basic async rpc client

2018-06-26 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523411#comment-16523411
 ] 

Duo Zhang commented on HDFS-13643:
--

Seems no response...

Anyway, the architecture of async dfs client is layered. The rpc layer could 
still be switched back to the one in hadoop-common later. Will commit the patch 
here to branch HDFS-13572 tomorrow if no other objections so we can start the 
follow on work.

Thanks.

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643-v1.patch, HDFS-13643-v2.patch, 
> HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13643) Implement basic async rpc client

2018-06-11 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509121#comment-16509121
 ] 

Duo Zhang commented on HDFS-13643:
--

[~daryn] Any updates here boss?

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643-v1.patch, HDFS-13643-v2.patch, 
> HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13643) Implement basic async rpc client

2018-06-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-13643:
-
Attachment: HDFS-13643-v2.patch

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643-v1.patch, HDFS-13643-v2.patch, 
> HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13643) Implement basic async rpc client

2018-05-31 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497413#comment-16497413
 ] 

Duo Zhang commented on HDFS-13643:
--

Fix the compile issue, it is because I changed the name in proto file but 
forget to recompile it... Let's see the pre commit result first. Will fix the 
checkstyle issue in the next patch.

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643-v1.patch, HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13643) Implement basic async rpc client

2018-05-31 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-13643:
-
Attachment: HDFS-13643-v1.patch

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643-v1.patch, HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13643) Implement basic async rpc client

2018-05-31 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497380#comment-16497380
 ] 

Duo Zhang edited comment on HDFS-13643 at 6/1/18 12:41 AM:
---

Yes, this will be committed to a feature branch first, so the intention here is 
to provide the basic support only, so we can finish it soon and then the follow 
on work on async dfs implementation can start ASAP. We can add security support 
later as it is not a blocked for the dfs implementation.

And for modifying the rpc implementation in hadoop-common is what we want to 
avoid here... In HDFS-9924 some other guys tried this approach and obviously 
they were in trouble... And for me, as hadoop-common is the core part of 
hadoop, almost every module in hadoop depend on it, I do not want to modify its 
code unless we have a very strong evidence that the modified version is better. 
So I plan to implement an async rpc client in HDFS project only and if later 
the async dfs project is proven to be a good one, then we can think of merge 
the code with hadoop-common.

Thanks.


was (Author: apache9):
Yes, this will be committed to a feature branch first, so the intention here is 
to provide the basic support only, so we can finish it soon and then the follow 
on work on async dfs implementation can start ASAP. We can add security support 
later as it is not a blocked for the dfs implementation.

And for modifying the rpc implementation in hadoop-common is what we want to 
avoid here... In HDFS-9924 some other guys tried this approach and obviously 
they failed... And for me, as hadoop-common is the core part of hadoop, almost 
every module in hadoop depend on it, I do not want to modify its code unless we 
have a very strong evidence that the modified version is better. So I plan to 
implement an async rpc client in HDFS project only and if later the async dfs 
project is proven to be a good one, then we can think of merge the code with 
hadoop-common.

Thanks.

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13643) Implement basic async rpc client

2018-05-31 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497380#comment-16497380
 ] 

Duo Zhang commented on HDFS-13643:
--

Yes, this will be committed to a feature branch first, so the intention here is 
to provide the basic support only, so we can finish it soon and then the follow 
on work on async dfs implementation can start ASAP. We can add security support 
later as it is not a blocked for the dfs implementation.

And for modifying the rpc implementation in hadoop-common is what we want to 
avoid here... In HDFS-9924 some other guys tried this approach and obviously 
they failed... And for me, as hadoop-common is the core part of hadoop, almost 
every module in hadoop depend on it, I do not want to modify its code unless we 
have a very strong evidence that the modified version is better. So I plan to 
implement an async rpc client in HDFS project only and if later the async dfs 
project is proven to be a good one, then we can think of merge the code with 
hadoop-common.

Thanks.

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13643) Implement basic async rpc client

2018-05-31 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496476#comment-16496476
 ] 

Duo Zhang commented on HDFS-13643:
--

Oh, seems the newly added proto file for testing does not work. Let me check.

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13643) Implement basic async rpc client

2018-05-31 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-13643:
-
Assignee: Duo Zhang
  Status: Patch Available  (was: Open)

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13643) Implement basic async rpc client

2018-05-31 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-13643:
-
Attachment: HDFS-13643.patch

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13643) Implement basic async rpc client

2018-05-30 Thread Duo Zhang (JIRA)
Duo Zhang created HDFS-13643:


 Summary: Implement basic async rpc client
 Key: HDFS-13643
 URL: https://issues.apache.org/jira/browse/HDFS-13643
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ipc
Reporter: Duo Zhang


Implement the basic async rpc client so we can start working on the DFSClient 
implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3

2018-05-30 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495926#comment-16495926
 ] 

Duo Zhang commented on HDFS-13572:
--

Good. Let's start working on this.

> [umbrella] Non-blocking HDFS Access for H3
> --
>
> Key: HDFS-13572
> URL: https://issues.apache.org/jira/browse/HDFS-13572
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs async
>Affects Versions: 3.0.0
>Reporter: stack
>Priority: Major
> Attachments: Nonblocking HDFS Access.pdf
>
>
> An umbrella JIRA for supporting non-blocking HDFS access in h3.
> This issue has provenance in the stalled HDFS-9924 but would like to vault 
> over what was going on over there, in particular, focus on an async API for 
> hadoop3+ unencumbered by worries about how to make it work in hadoop2.
> Let me post a WIP design. Would love input/feedback (We make mention of the 
> HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was 
> thinking of cutting a feature branch if all good after a bit of chat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-13640) enable ShortCircuit Read on UC block

2018-05-30 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reopened HDFS-13640:
--

> enable ShortCircuit Read on UC block
> 
>
> Key: HDFS-13640
> URL: https://issues.apache.org/jira/browse/HDFS-13640
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.2
>Reporter: Gang Xie
>Priority: Major
>
> The ShortCircuit read is disabled by HDFS-2757 due to the inconsistency of 
> the block states. With this limitation, some streaming/messaging application 
> could not benefit the performance improvement from SCR. In our streaming 
> system whose storage is HDFS, has around 90% read on the last blocks. So, 
> it's necessary to enable SCR on the last block, especially the app could 
> ensure the read after the flush.
> After look into the original issue in HDFS-2757, it could only happen when 
> the read beyond the flush or the local datanode is kicked out from the 
> pipeline. But if there is the data and the visible length of the block covers 
> the read length, we could still read the data from the block, right?  
>  
> I didn't get a completed solution here. Any suggestion could be helpful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13640) s

2018-05-30 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HDFS-13640.
--
Resolution: Invalid

Accident?

> s
> -
>
> Key: HDFS-13640
> URL: https://issues.apache.org/jira/browse/HDFS-13640
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.2
>Reporter: Gang Xie
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3

2018-05-16 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-13572:
-
Attachment: Nonblocking HDFS Access.pdf

> [umbrella] Non-blocking HDFS Access for H3
> --
>
> Key: HDFS-13572
> URL: https://issues.apache.org/jira/browse/HDFS-13572
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs async
>Affects Versions: 3.0.0
>Reporter: stack
>Priority: Major
> Attachments: Nonblocking HDFS Access.pdf
>
>
> An umbrella JIRA for supporting non-blocking HDFS access in h3.
> This issue has provenance in the stalled HDFS-9924 but would like to vault 
> over what was going on over there, in particular, focus on an async API for 
> hadoop3+ unencumbered by worries about how to make it work in hadoop2.
> Let me post a WIP design. Would love input/feedback (We make mention of the 
> HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was 
> thinking of cutting a feature branch if all good after a bit of chat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13571) Dead datanode detector

2018-05-15 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476775#comment-16476775
 ] 

Duo Zhang commented on HDFS-13571:
--

This is a useful for HBase, since the RS of HBase will keep a large amount of 
DFSInputStream open for a long time.

> Dead datanode detector
> --
>
> Key: HDFS-13571
> URL: https://issues.apache.org/jira/browse/HDFS-13571
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
>Reporter: Gang Xie
>Priority: Minor
> Fix For: 3.0.2
>
>
> Currently, the information of the dead datanode in DFSInputStream in stored 
> locally. So, it could not be shared among the inputstreams of the same 
> DFSClient. In our production env, every days, some datanodes dies with 
> different causes. At this time, after the first inputstream blocked and 
> detect this, it could share this information to others in the same DFSClient, 
> thus, the ohter inputstreams are still blocked by the dead node for some 
> time, which could cause bad service latency.
> To eliminate this impact from dead datanode, we designed a dead datanode 
> detector, which detect the dead ones in advance, and share this information 
> among all the inputstreams in the same client. This improvement has being 
> online for some months and works fine.  So, we decide to port to the 3.0 (the 
> version used in our production env is 2.4 and 2.6).
> I will do the porting work and upload the code later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2018-05-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465872#comment-16465872
 ] 

Duo Zhang commented on HDFS-9924:
-

Linked a design doc. Not finished yet but I think we can discuss the rpc client 
first.

Thanks.

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Duo Zhang
>Priority: Major
> Attachments: Async-HDFS-Performance-Report.pdf, 
> AsyncHdfs20160510.pdf, HDFS-9924-POC.patch
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2018-05-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462238#comment-16462238
 ] 

Duo Zhang commented on HDFS-9924:
-

HBase 2.0.0 has been released, and the AsyncFSWAL(HBASE-14790) has been shipped 
together with this release. We use lots of internal APIs of HDFS to implement 
the AsyncFSWAL, so it is expected that things like HBASE-20244 will happen 
again and again.

To make life easier, we need to move the async output related code into HDFS. 
The POC above shows that option 3 can work, so I plan to create a feature 
branch to implement the async dfs client. In general I think there are 4 steps:

1. Implement an async rpc client with option 3 describe above.
2. Implement the filesystem APIs which only need to connect to NN, such as 
'mkdirs'.
3. Implement async file read. The problem is the API. For pread I think a 
CompletableFuture is enough, the problem is for the streaming read. Need to 
discuss later.
4. Implement async file write. The API will also be a problem, but a more 
important problem is that, if we want to support fan-out, the current logic at 
DN side will make the semantic broken as we can read uncommitted data very 
easily. In HBase it is solved by HBASE-14004 but I do not think we should keep 
the broken behavior in HDFS. We need to find a way to deal with it.

Thanks.


> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Duo Zhang
>Priority: Major
> Attachments: Async-HDFS-Performance-Report.pdf, 
> AsyncHdfs20160510.pdf, HDFS-9924-POC.patch
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2017-06-09 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reassigned HDFS-9924:
---

Assignee: Duo Zhang  (was: Xiaobing Zhou)

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Duo Zhang
> Attachments: AsyncHdfs20160510.pdf, 
> Async-HDFS-Performance-Report.pdf, HDFS-9924-POC.patch
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2017-06-09 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-9924:

Attachment: HDFS-9924-POC.patch

A simple rpc client built on netty. Test mkdirs and getFileInfo in 
TestAsyncDFSClient. Used to prove that option 3 can work.

https://github.com/Apache9/hadoop/commit/aa1623c426ee97398089ba15496651119a1672a9

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf, 
> Async-HDFS-Performance-Report.pdf, HDFS-9924-POC.patch
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2017-02-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851248#comment-15851248
 ] 

Duo Zhang commented on HDFS-9924:
-

[~stack] Seems the performance problem belongs to grpc-go, not grpc-java.

Anyway, I found that we already have the generated protobuf interfaces but we 
only implement the blocking part with reflection...

So let me implement a POC with option 3.

Will be back later.

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf, Async-HDFS-Performance-Report.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2017-01-27 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843802#comment-15843802
 ] 

Duo Zhang commented on HDFS-9924:
-

{quote}
It'd be in-process listening on the DN port reading a few bytes to figure which 
RPC?
{quote}
Yes, just like the service introduced in HDFS-8377.

Let me see the cockroach approach. Thanks [~stack].

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf, Async-HDFS-Performance-Report.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2017-01-25 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837416#comment-15837416
 ] 

Duo Zhang commented on HDFS-9924:
-

Any updates here? Seems no commit on HDFS-9924 branch for a long time...

I can help a bit here as I still want to move the 
FanOutOneBlockAsyncDFSOutputStream into HDFS rather than maintain it in HBase...

I think the problem here is that, our interface is blocking. It is really 
awkward to implement async stuffs on top of an blocking interface so I do not 
like the current approach. I think we can either

1. Use grpc instead of the current rpc. Add a port unification service in front 
of the grpc server and the old rpc server to support both grpc client and old 
client. Yeah we need to write lots of code if we choose this way, but I think 
most code are just boilerplate. Another benefit is that multi language support 
will be much easier if we use standard grpc.

2. Use grpc but do not use the HTTP/2 transport, implement our own transport. I 
haven't tried this yet but grpc-java does support customized transport so I 
think it is possible. The benefit is that we do not need port unification 
service at server side and do not need to maintain two implementations at 
server side.

3. Use the old protobuf rpc interface and implement a new rpc framework. The 
benefit is that we also do not need port unification service at server side and 
do not need to maintain two implementations at server side. And one more thing 
is that we do not need to upgrade protobuf to 3.x.

4. As said in the design doc above, generate new interfaces which return a 
CompletableFuture based on the old blocking interface. And add a new feature in 
the current rpc implementation to support the new interface.

I'm OK with any of the approach above.  Can start working on branch HDFS-9924 
after we decide which one to use.

Thanks.

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf, Async-HDFS-Performance-Report.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2016-06-16 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1533#comment-1533
 ] 

Duo Zhang commented on HDFS-9924:
-

{quote}
I think this statement is not yet confirmed. It seems that some people prefer 
CompletableFuture in trunk. We need to verify it.
{quote}

For me, if only H3 is supported, I prefer CompletableFuture. If both H2 and H3, 
I prefer Deferred.

Thanks.

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-14 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330890#comment-15330890
 ] 

Duo Zhang commented on HDFS-9924:
-

My concern is that, you can not tell people that hive is only compatible with 
hadoop-2.8.x, right?
For example, we set hbase to be compatible with hadoop-2.4+, so usually we will 
optimize for all hadoop-2.4+ versions if possible instead of using a new 
feature only introduced in a newer version.

Here, a thread pool solution works for all hadoop-2.x versions. And it is not 
that terrible to have 1MB stack size per thread... It is offheap, only 
increases 1MB VSZ, not RSS, RSS will increase on demand. And you can set a 
smaller stack size if you like to reduce the overhead.

For the implementation, what [~stack] said above is the experience we got from 
our write-ahead-log implementation. And for the hive case here, yes, you have a 
different pattern. But it is not a good idea to wait on Futures sequentially. 
For example, you have request 0-99, and request 1 is blocked for a long time 
and request 2-99 are all failed. With your solution, you will block on request 
1 for a long time before resubmit the failed 2-99 request. This is a inherent 
defect of lacking the support of callback. And a better solution is, sorry, but 
again, using multiple threads. With a thread pool and {{CompletionService}}, 
you can (sometimes) get the failed request first.

Hope this could help. Thanks.

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10345) [umbrella] Implement an asynchronous DistributedFileSystem

2016-04-29 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263623#comment-15263623
 ] 

Duo Zhang commented on HDFS-10345:
--

And I think this will not effect performance since the rpc is born 
asynchronous...

Thanks.

> [umbrella] Implement an asynchronous DistributedFileSystem
> --
>
> Key: HDFS-10345
> URL: https://issues.apache.org/jira/browse/HDFS-10345
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10345) [umbrella] Implement an asynchronous DistributedFileSystem

2016-04-29 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263614#comment-15263614
 ] 

Duo Zhang commented on HDFS-10345:
--

I think it is more reasonable and clean to implement the synchronous DFS based 
on the asynchronous DFS?
Get a Future object by calling the same method in asynchronous DFS, and then 
call its get method...

> [umbrella] Implement an asynchronous DistributedFileSystem
> --
>
> Key: HDFS-10345
> URL: https://issues.apache.org/jira/browse/HDFS-10345
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs, hdfs-client
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
>
> This is proposed to implement an asynchronous DistributedFileSystem based on 
> AsyncFileSystem APIs in HADOOP-12910.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-223) Asynchronous IO Handling in Hadoop and HDFS

2016-04-29 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263601#comment-15263601
 ] 

Duo Zhang commented on HDFS-223:


In HBASE-14790 we have implemented a {{FanOutOneBlockAsyncDFSOutput}} based on 
netty. It performs much better than the default {{DFSOutputStream}} for WAL in 
HBase.

And we plan to move these stuffs into HDFS since it should belong to HDFS. Of 
course this is not a simple copy-paste work, the implementation in HBase is not 
suitable for general use.

Will come back later with a proposal on the asynchronous API first.

 Thanks.

> Asynchronous IO Handling in Hadoop and HDFS
> ---
>
> Key: HDFS-223
> URL: https://issues.apache.org/jira/browse/HDFS-223
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Raghu Angadi
> Attachments: GrizzlyEchoServer.patch, MinaEchoServer.patch
>
>
> I think Hadoop needs utilities or framework to make it simpler to deal with 
> generic asynchronous IO in  Hadoop.
> Example use case :
> Its been a long standing problem that DataNode takes too many threads for 
> data transfers. Each write operation takes up 2 threads at each of the 
> datanodes and each read operation takes one irrespective of how much activity 
> is on the sockets. The kinds of load that HDFS serves has been expanding 
> quite fast and HDFS should handle these varied loads better. If there is a 
> framework for non-blocking IO, read and write pipeline state machines could 
> be implemented with async events on a fixed number of threads. 
> A generic utility is better since it could be used in other places like 
> DFSClient. DFSClient currently creates 2 extra threads for each file it has 
> open for writing.
> Initially I started writing a primitive "selector", then tried to see if such 
> facility already exists. [Apache MINA|http://mina.apache.org] seemed to do 
> exactly this. My impression after looking the the interface and examples is 
> that it does not give kind control we might prefer or need.  First use case I 
> was thinking of implementing using MINA was to replace "response handlers" in 
> DataNode. The response handlers are simpler since they don't involve disk 
> I/O. I [asked on MINA user 
> list|http://www.nabble.com/Async-events-with-existing-NIO-sockets.-td18640767.html],
>  but looks like it can not be done, I think mainly because the sockets are 
> already created.
> Essentially what I have in mind is similar to MINA, except that read and 
> write of the sockets is done by the event handlers. The lowest layer 
> essentially invokes selectors, invokes event handlers on single or on 
> multiple threads. Each event handler is is expected to do some non-blocking 
> work. We would of course have utility handler implementations that do  read, 
> write, accept etc, that are useful for simple processing.
> Sam Pullara mentioned that [xSockets|http://xsocket.sourceforge.net/] is more 
> flexible. It is under GPL.
> Are there other such implementations we should look at?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration

2016-03-31 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219664#comment-15219664
 ] 

Duo Zhang commented on HDFS-8782:
-

+1

> Upgrade to block ID-based DN storage layout delays DN registration
> --
>
> Key: HDFS-8782
> URL: https://issues.apache.org/jira/browse/HDFS-8782
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Priority: Critical
>
> We have seen multiple incidents at production sites that there are long 
> delays for DNs to register to the NN when upgrading to post 2.6 release.
> Further investigation shows that the DN is blocked when upgrading the storage 
> layout introduced in HDFS-6482. The new storage layout requires making up to 
> 64k directories in the underlying file system. Unfortunately the current 
> implementation calls {{mkdirs()}} sequentially and upgrades each volume in 
> sequential order.
> As a result, upgrading a DN with a lot of disks or with blocks that have 
> random block ID takes a long time (usually in hours), and the DN won't 
> register to the NN unless it finishes upgrading all the storage directory. 
> The excessive delays confuse operations and break the assumption of rolling 
> upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-11-05 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992965#comment-14992965
 ] 

Duo Zhang commented on HDFS-8578:
-

Could we assign these values earlier at the time you load properties from all 
the StorageDirectories for getting datanodeUuid? This could make things more 
clear.
Use multiple threads to write a shared field without locking is not a good 
choice although we do not have any problem right now. It implies that the 
method is thread safe so later people may add some dangerous code in the 
method...

Thanks.




> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9372) Typo in DataStorage.recoverTransitionRead

2015-11-04 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-9372:

Attachment: HDFS-9372-v1.patch

Just remove the dead code.

> Typo in DataStorage.recoverTransitionRead
> -
>
> Key: HDFS-9372
> URL: https://issues.apache.org/jira/browse/HDFS-9372
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Attachments: HDFS-9372-v0.patch, HDFS-9372-v1.patch
>
>
> {code:title=DataStorage.java}
> if (this.initialized) {
>   LOG.info("DataNode version: " + 
> HdfsServerConstants.DATANODE_LAYOUT_VERSION
>   + " and NameNode layout version: " + nsInfo.getLayoutVersion());
>   this.storageDirs = new ArrayList(dataDirs.size());
>   // mark DN storage is initialized
>   this.initialized = true;
> }
> {code}
> The first if should be {{!this.initialized}} I think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration

2015-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989130#comment-14989130
 ] 

Duo Zhang commented on HDFS-8782:
-

I tried to make {{DataStorage.addStorageLocations}} run parallelly but I found 
it is difficult.

There are some properties in {{DataStorage}}(inherit from {{StorageInfo}}) 
which will be updated when loading {{StorageDirectory}}, such as 
{{layoutVersion}}, so it may have side effect when changing the code from 
sequential to parallel even if I use lock everywhere to protect these 
properties.

I do not get the point why we need a {{layoutVersion}} in {{DataStorage}}? As 
far as I know, {{DataStorage}} is only a container of {{StorageDirectory}} or 
{{BlockPoolSliceStorage}} if federation is enabled. So what does the 
{{layoutVersion}} in {{DataStorage}} mean? Is there any history reason for 
keeping it?

Thanks.

> Upgrade to block ID-based DN storage layout delays DN registration
> --
>
> Key: HDFS-8782
> URL: https://issues.apache.org/jira/browse/HDFS-8782
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Priority: Critical
>
> We have seen multiple incidents at production sites that there are long 
> delays for DNs to register to the NN when upgrading to post 2.6 release.
> Further investigation shows that the DN is blocked when upgrading the storage 
> layout introduced in HDFS-6482. The new storage layout requires making up to 
> 64k directories in the underlying file system. Unfortunately the current 
> implementation calls {{mkdirs()}} sequentially and upgrades each volume in 
> sequential order.
> As a result, upgrading a DN with a lot of disks or with blocks that have 
> random block ID takes a long time (usually in hours), and the DN won't 
> register to the NN unless it finishes upgrading all the storage directory. 
> The excessive delays confuse operations and break the assumption of rolling 
> upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8578) On upgrade, Datanode should process all storage/data dirs in parallel

2015-11-04 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989371#comment-14989371
 ] 

Duo Zhang commented on HDFS-8578:
-

Oh, great, there is already an inprogress patch.

I skimmed patch v10, seems you do not modify the {{format}} method. So how do 
you deal with the concurrent modification to {{layoutVersion}} and other 
properties? And {{layoutVersion}} could be changed in other place. And also, 
seems other properties are always assigned with same values, so could we move 
this to another place that only execute once? The code is a little confusing 
right now...

{code}
  private void format(StorageDirectory sd, NamespaceInfo nsInfo,
  String datanodeUuid) throws IOException {
sd.clearDirectory(); // create directory
this.layoutVersion = HdfsServerConstants.DATANODE_LAYOUT_VERSION;
this.clusterID = nsInfo.getClusterID();
this.namespaceID = nsInfo.getNamespaceID();
this.cTime = 0;
this.datanodeUuid = datanodeUuid;

if (sd.getStorageUuid() == null) {
  // Assign a new Storage UUID.
  sd.setStorageUuid(DatanodeStorage.generateUuid());
}

writeProperties(sd);
  }
{code}

Thanks.

> On upgrade, Datanode should process all storage/data dirs in parallel
> -
>
> Key: HDFS-8578
> URL: https://issues.apache.org/jira/browse/HDFS-8578
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Raju Bairishetti
>Assignee: Vinayakumar B
>Priority: Critical
> Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, 
> HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, 
> HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, 
> HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-branch-2.6.0.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>   doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>   assert getCTime() == nsInfo.getCTime() 
>   : "Data-node and name-node CTimes must be the same.";
> }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9372) Typo in DataStorage.recoverTransitionRead

2015-11-03 Thread Duo Zhang (JIRA)
Duo Zhang created HDFS-9372:
---

 Summary: Typo in DataStorage.recoverTransitionRead
 Key: HDFS-9372
 URL: https://issues.apache.org/jira/browse/HDFS-9372
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Duo Zhang


{code:title=DataStorage.java}
if (this.initialized) {
  LOG.info("DataNode version: " + 
HdfsServerConstants.DATANODE_LAYOUT_VERSION
  + " and NameNode layout version: " + nsInfo.getLayoutVersion());
  this.storageDirs = new ArrayList(dataDirs.size());
  // mark DN storage is initialized
  this.initialized = true;
}
{code}

The first if should be {{!this.initialized}} I think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9372) Typo in DataStorage.recoverTransitionRead

2015-11-03 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-9372:

Assignee: Duo Zhang
  Status: Patch Available  (was: Open)

> Typo in DataStorage.recoverTransitionRead
> -
>
> Key: HDFS-9372
> URL: https://issues.apache.org/jira/browse/HDFS-9372
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Attachments: HDFS-9372-v0.patch
>
>
> {code:title=DataStorage.java}
> if (this.initialized) {
>   LOG.info("DataNode version: " + 
> HdfsServerConstants.DATANODE_LAYOUT_VERSION
>   + " and NameNode layout version: " + nsInfo.getLayoutVersion());
>   this.storageDirs = new ArrayList(dataDirs.size());
>   // mark DN storage is initialized
>   this.initialized = true;
> }
> {code}
> The first if should be {{!this.initialized}} I think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9372) Typo in DataStorage.recoverTransitionRead

2015-11-03 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-9372:

Attachment: HDFS-9372-v0.patch

> Typo in DataStorage.recoverTransitionRead
> -
>
> Key: HDFS-9372
> URL: https://issues.apache.org/jira/browse/HDFS-9372
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Duo Zhang
> Attachments: HDFS-9372-v0.patch
>
>
> {code:title=DataStorage.java}
> if (this.initialized) {
>   LOG.info("DataNode version: " + 
> HdfsServerConstants.DATANODE_LAYOUT_VERSION
>   + " and NameNode layout version: " + nsInfo.getLayoutVersion());
>   this.storageDirs = new ArrayList(dataDirs.size());
>   // mark DN storage is initialized
>   this.initialized = true;
> }
> {code}
> The first if should be {{!this.initialized}} I think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Add read block support for DataNode HTTP/2 server

2015-10-17 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Attachment: HDFS-8471-v8.patch

Rebase and add os cache management.

> Add read block support for DataNode HTTP/2 server
> -
>
> Key: HDFS-8471
> URL: https://issues.apache.org/jira/browse/HDFS-8471
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: HDFS-7966
>
> Attachments: HDFS-8471-v6.patch, HDFS-8471-v7.patch, 
> HDFS-8471-v8.patch, HDFS-8471.1.patch, HDFS-8471.2.patch, HDFS-8471.3.patch, 
> HDFS-8471.4.patch, HDFS-8471.5.patch, HDFS-8471.patch
>
>
> Based on the streamed channel introduced in HDFS-8515.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8671) Add client support for HTTP/2 stream channels

2015-10-16 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8671:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to HDFS-7966 branch. Thanks [~wheat9] for reviewing.

> Add client support for HTTP/2 stream channels
> -
>
> Key: HDFS-8671
> URL: https://issues.apache.org/jira/browse/HDFS-8671
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: HDFS-7966
>
> Attachments: HDFS-8671-v0.patch, HDFS-8671-v1.patch
>
>
> {{Http2StreamChannel}} is introduced in HDFS-8515 but can only be used at 
> server side.
> Now we implement Http2BlockReader using jetty http2-client in the POC branch, 
> but the final version of jetty 9.3.0 only accepts java8.
> So here we plan to extend the functions of {{Http2StreamChannel}} to support 
> client side usage and then implement Http2BlockReader based on it. And we 
> still use jetty http2-client to write testcases to ensure that our http2 
> implementation is valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8671) Add client support for HTTP/2 stream channels

2015-09-23 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8671:

Attachment: HDFS-8671-v1.patch

Pick code from POC branch.

Plan to run YCSB on HDFS-7966 branch which is more convincing, so we need to 
pick the primary code from POC branch first.

Thanks.

> Add client support for HTTP/2 stream channels
> -
>
> Key: HDFS-8671
> URL: https://issues.apache.org/jira/browse/HDFS-8671
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
> Fix For: HDFS-7966
>
> Attachments: HDFS-8671-v0.patch, HDFS-8671-v1.patch
>
>
> {{Http2StreamChannel}} is introduced in HDFS-8515 but can only be used at 
> server side.
> Now we implement Http2BlockReader using jetty http2-client in the POC branch, 
> but the final version of jetty 9.3.0 only accepts java8.
> So here we plan to extend the functions of {{Http2StreamChannel}} to support 
> client side usage and then implement Http2BlockReader based on it. And we 
> still use jetty http2-client to write testcases to ensure that our http2 
> implementation is valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-09-12 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741935#comment-14741935
 ] 

Duo Zhang commented on HDFS-7966:
-

No, the testcase uses multiple connections... But yes, this is not a typical 
usage in real world. Let me try to deploy an HBase on top of HDFS and run YCSB 
to collect some performance data. Thanks.

> New Data Transfer Protocol via HTTP/2
> -
>
> Key: HDFS-7966
> URL: https://issues.apache.org/jira/browse/HDFS-7966
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Qianqian Shi
>  Labels: gsoc, gsoc2015, mentor
> Attachments: GSoC2015_Proposal.pdf, 
> TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
> TestHttp2ReadBlockInsideEventLoop.svg
>
>
> The current Data Transfer Protocol (DTP) implements a rich set of features 
> that span across multiple layers, including:
> * Connection pooling and authentication (session layer)
> * Encryption (presentation layer)
> * Data writing pipeline (application layer)
> All these features are HDFS-specific and defined by implementation. As a 
> result it requires non-trivial amount of work to implement HDFS clients and 
> servers.
> This jira explores to delegate the responsibilities of the session and 
> presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
> connection multiplexing, QoS, authentication and encryption, reducing the 
> scope of DTP to the application layer only. By leveraging the existing HTTP/2 
> library, it should simplify the implementation of both HDFS clients and 
> servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8782) Upgrade to block ID-based DN storage layout delays DN registration

2015-09-10 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740128#comment-14740128
 ] 

Duo Zhang commented on HDFS-8782:
-

I think at least we could upgrade each volume parallel?

I tried upgrading from 2.5.0 to 2.7.1. It spent more than 20 minutes on a 3T * 
11 datanode... If parallel, the halt time could reduce to 2 minutes I think?

Thanks.

> Upgrade to block ID-based DN storage layout delays DN registration
> --
>
> Key: HDFS-8782
> URL: https://issues.apache.org/jira/browse/HDFS-8782
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Priority: Critical
>
> We have seen multiple incidents at production sites that there are long 
> delays for DNs to register to the NN when upgrading to post 2.6 release.
> Further investigation shows that the DN is blocked when upgrading the storage 
> layout introduced in HDFS-6482. The new storage layout requires making up to 
> 64k directories in the underlying file system. Unfortunately the current 
> implementation calls {{mkdirs()}} sequentially and upgrades each volume in 
> sequential order.
> As a result, upgrading a DN with a lot of disks or with blocks that have 
> random block ID takes a long time (usually in hours), and the DN won't 
> register to the NN unless it finishes upgrading all the storage directory. 
> The excessive delays confuse operations and break the assumption of rolling 
> upgrades.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-09-09 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736658#comment-14736658
 ] 

Duo Zhang commented on HDFS-7966:
-

Netty-4.1.0Beta6 is out so I'm back. I have added a simple {{asyncRead}} 
method(not fully asynchronous since this is only a POC) to {{DFSInputStream}} 
and write a performance test for it. Here is the test result(two times for 
every test)

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest async /test 100 
5 4096 // 100 here means max concurrency which used to prevent OOM.
*** time based on http2 230946
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest async /test 100 
5 4096
*** time based on http2 231066

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 100 
5 4096 pread
*** time based on tcp 231410
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 100 
5 4096 pread
*** time based on tcp 231038

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 100 
5 4096 pread
*** time based on http2 236069
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 100 
5 4096 pread
*** time based on http2 231773
{noformat}

The performance difference is ~±4% and async is a little better than tcp.

Thanks.

> New Data Transfer Protocol via HTTP/2
> -
>
> Key: HDFS-7966
> URL: https://issues.apache.org/jira/browse/HDFS-7966
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Qianqian Shi
>  Labels: gsoc, gsoc2015, mentor
> Attachments: GSoC2015_Proposal.pdf, 
> TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
> TestHttp2ReadBlockInsideEventLoop.svg
>
>
> The current Data Transfer Protocol (DTP) implements a rich set of features 
> that span across multiple layers, including:
> * Connection pooling and authentication (session layer)
> * Encryption (presentation layer)
> * Data writing pipeline (application layer)
> All these features are HDFS-specific and defined by implementation. As a 
> result it requires non-trivial amount of work to implement HDFS clients and 
> servers.
> This jira explores to delegate the responsibilities of the session and 
> presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
> connection multiplexing, QoS, authentication and encryption, reducing the 
> scope of DTP to the application layer only. By leveraging the existing HTTP/2 
> library, it should simplify the implementation of both HDFS clients and 
> servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652698#comment-14652698
 ] 

Duo Zhang commented on HDFS-7966:
-

I do not have enough machines to test the scenario... What I see if I create 
lots of thread to read from datanode concurrently is that HTTP/2 will start the 
request almost at the same time, but TCP will start the request one by 
one(maybe tens by tens where the number is cpu count). So there won't be a 
situation that DN really handle lots of concurrent read from client, and the 
context switch maybe small than HTTP/2 implementation since we also have a 
ThreadPool besides EventLoopGroup in HTTP/2 connection. And what make things 
worse is that our client is not event driven so we can not reduce the thread 
count of client...
Let me see if I can make a scenario that HTTP/2 fast than TCP...
Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
 TestHttp2ReadBlockInsideEventLoop.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-02 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651021#comment-14651021
 ] 

Duo Zhang commented on HDFS-7966:
-

I modified {{Http2ConnectionPool}} to allow creating multiple HTTP/2 
connections to one datanode(default is 10). Here is the test result

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 10 
10 1024 pread
*** time based on tcp 38343

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 10 
10 1024 pread
*** time based on http2 45799

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 100 
1 1024 pread
*** time based on tcp 20206

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 100 
1 1024 pread
*** time based on http2 21980

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 500 
2000 1024 pread
*** time based on tcp 20146

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 500 
2000 1024 pread
*** time based on http2 22461
{noformat}

HTTP/2 is 19%, 9% and 11% slower than TCP. Notice that the {{DFSClient}} is not 
event-driven thus we have more threads when using HTTP/2 at client side, so I 
think the performance here is acceptable? We could introduce a new event-driven 
{{FileSystem}}(maybe like HDFS-8707?) later to improve client performance.

The performance testing is almost done here. Next I will begin to pick code 
from POC branch to HDFS-7966 branch. Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
 TestHttp2ReadBlockInsideEventLoop.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650242#comment-14650242
 ] 

Duo Zhang commented on HDFS-7966:
-

OK I'm back. Here is the test result of HTTP/2 that remove context-switch 
overhead. See the 'noswitch' part in

https://github.com/Apache9/hadoop/blob/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/http2/PerformanceTest.java

And I also remove the thread pool in ReadBlockHandler.

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 1 
100 1024 pread
*** time based on tcp 260776

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1 
100 1024 pread
*** time based on http2 301257

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest noswitch /test 
100 1024
*** time based on http2 264012
{noformat}

(264012 - 260776) / 260776 = 0.012
So if I remove context-switch, HTTP/2 is only 1.2% slower than TCP.

Of course it is not acceptable to write code like this in real production. It 
is only used to prove that context-switch is the primary overhead.

And in fact, although HTTP/2 is about 30% slower than TCP in this case, it is 
still fast enough I think? It is only 0.32ms per read, so maybe it is 
acceptable?

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
 TestHttp2ReadBlockInsideEventLoop.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-20 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-7966:

Attachment: TestHttp2ReadBlockInsideEventLoop.svg

The flame graph of a {{TestHttp2ReadBlockInsideEventLoop}} run.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
 TestHttp2ReadBlockInsideEventLoop.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-20 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633089#comment-14633089
 ] 

Duo Zhang commented on HDFS-7966:
-

Write a single threaded testcase that do all the test works inside event loop.

https://github.com/Apache9/hadoop/blob/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/web/dtp/TestHttp2ReadBlockInsideEventLoop.java

And at server side, I remove the thread pool in {{ReadBlockHandler}}.

The result is
{noformat}
*** time based on tcp 17734ms
*** time based on http2 20019ms

*** time based on tcp 18878ms
*** time based on http2 21422ms

*** time based on tcp 17562ms
*** time based on http2 20568ms

*** time based on tcp 18726ms
*** time based on http2 20251ms

*** time based on tcp 18632ms
*** time based on http2 21227ms
{noformat}

The average time of original tcp is 18306.4ms, and HTTP/2 is 20697.4ms. 

20697.4 / 18306.4 = 1.13, so HTTP/2 is 13% slower than tcp. In the above test 
it is 30% slower, so I think context switch maybe one of the problem why HTTP/2 
is much slower than tcp. Will do this test on a real cluster to get more data.

And the one {{EventLoop}} per datanode problem, I think it is a problem on a 
small cluster. So I think we should allow creating multiple HTTP/2 connections 
to one datanode. I will modify {{Http2ConnectionPool}} and do the test again.

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-15 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627827#comment-14627827
 ] 

Duo Zhang commented on HDFS-7966:
-

Small read using {{PerformanceTest}}. Unit is millisecond.

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 
1(thread number) 100(read count per thread) 1024(bytes per read) pread(use 
pread)
{noformat}

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 1 
100 1024 pread
*** time based on tcp 242730

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1 
100 1024 pread
*** time based on http2 324491

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 10 
10 1024 pread
*** time based on tcp 40688

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 10 
10 1024 pread
*** time based on http2 82819

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 100 
1 1024 pread
*** time based on tcp 21612

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 100 
1 1024 pread
*** time based on http2 69658

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 500 
2000 1024 pread
*** time based on tcp 19931

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 500 
2000 1024 pread
*** time based on http2 151727

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1000 
1000 1024 pread
*** time based on http2 251735
{noformat}

For the single threaded test, 324491/242730=1.34, so http2 is 30% slow than 
tcp. Will try to find the overhead later.

And for multi threaded test, http2 is much slow than tcp. And tcp failed the 
1000 threads test.

I think the problem is that I only use one connection in http2 so there is only 
one EventLoop(which means only one thread) which sends or receives data. And 
for tcp, the thread number is same with connection number. The {{%CPU}} of 
datanode when using http2 is always around 100% no matter the thread number is 
10 or 100 or 1000. But when using tcp the {{%CPU}} could be higher than 1500% 
when the number of thread increasing. Next I will write new test which can use 
multiple http2 connections.

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-14 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627512#comment-14627512
 ] 

Duo Zhang commented on HDFS-7966:
-

I'd say sorry... The performance result above is useless since the flow control 
part of my code does not work at that time... I found it when I tried to 
transfer 512MB block-I got an OOM...

I have rewritten the flow control part, and setup a cluster with 1 NN and DN to 
evaluate the performance. There is a netty 
bug(https://github.com/netty/netty/pull/3929) so I need to modify my code when 
running different tests.

The performance test code is here
https://github.com/Apache9/hadoop/blob/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/http2/PerformanceTest.java

First I ran a large read test with 1 file with a 1GB block. Each ran 5 times 
with the command
{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1 1 
1073741824
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 1 1 
1073741824
{noformat}

Note that I set {{dfs.datanode.transferTo.allowed}} to {{false}} since http2 
implementation can not use transferTo(I'm currently working on implementing 
{{FileRegion}} support in netty-http2-codec, see 
https://github.com/netty/netty/issues/3927)

The result is
{noformat}
*** time based on http2 9953
*** time based on http2 9967
*** time based on http2 9954
*** time based on http2 9985
*** time based on http2 9976

*** time based on tcp 9383
*** time based on tcp 9375
*** time based on tcp 9377
*** time based on tcp 9373
*** time based on tcp 9376
{noformat}

The average latency of http2 is 9967ms, and for tcp it is 9376.8ms.

9967/9376.8=1.063, so http2 is about 6% slow than tcp. I think this is an 
acceptable result?

Let me test small read later and post the result here. Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-12 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-7966:

Attachment: TestHttp2LargeReadPerformance.svg

Write a testcase which reads a 128MB block. The result is
{quote}
*** time based on http2 2776ms
*** time based on tcp 482ms
{quote}

This time, the client side is basically same, the overhead is at server side.

readWindowUpdateFrame is 14.27%, and the actual ChunkedBlockInput.readChunk is 
8.54%, so the HTTP/2 overhead is still about 6%(of course the denominator maybe 
different). And in ChunkedBlockInput.readChunk, only 2.91% of time is cost on 
file read write and protobuf message build, most of time is cost on ByteBuf 
management.

Will run test that reads data from different machine to see if ByteBuf 
management and HTTP/2 are still costly.

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-12 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623778#comment-14623778
 ] 

Duo Zhang commented on HDFS-7966:
-

And there is a good news, in {{TestHttp2RandomReadPerformance}}, HTTP/2 
implementation beats the old implementation. We create 4000 connections to one 
datanode, and use one thread to peek a connection and read a small chunk of 
data sequentially.

The result is
{noformat}
*** time based on http2 124ms
*** time based on tcp 274ms
{noformat}

And 5000 connections will cause OOM(can not create thread) in the tcp test. I 
think this is reasonable since NIO based framework has much less threads than 
OIO.

I'm busy these days so only have some time at weekend. Will do these tests on a 
cluster ASAP. Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-07 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-7966:

Attachment: TestHttp2Performance.svg

Running this testcase(change readPerThread to 20).

https://github.com/Apache9/hadoop/blob/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/web/dtp/TestHttp2Performance.java

This is a single threaded testcase that seeks back and read small data 
repeatedly. Notice that I disable transferTo since we can not use transferTo 
with netty HTTP/2 right now. 

The output result is
{noformat}
*** time based on http2 45145ms
*** time based on tcp 29871ms
{noformat}

HTTP/2 is about 50% slower.

The attachment is the flame graph of this test run. I collect some information
{noformat}
ReadBlockHandler: 24.42 = 9.60 + 8.82 + 4.14 + 1.86
DataXceiver : 23.64   

Http2BlockReader  : 12.81 = 5.67 + 1.26 + 5.13 + 0.35 + 0.40
RemoteBlockReader2: 10.41 = 6.90 + 3.51


ThreadPoolExecutor.execute: 0.98
ThreadPoolExecutor.getTask: 1.69
Other netty overhead  : 10.46 = 12.19 - 0.35 - 0.40 - 0.98
{noformat}

According to the graph, HTTP/2 should be (50.36 - 34.05) / 34.05 = 47.9% 
slower. Basically same with testcase output.

And the actual HTTP/2 overhead is about 5%(See Http2ConnectionHandler.decode in 
graph, and should minus the ThreadPoolExecutor and Http2DataReceiver time from 
it which is 6.85 - 0.98 - 0.35 - 0.40 = 5.12). So HTTP/2 should make us 15% 
slower.

I think first we need to find out why we are 50% but not 15% slower.

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617985#comment-14617985
 ] 

Duo Zhang commented on HDFS-7966:
-

This is the worst scenario for testing a NIO framework I think. NIO is consider 
to have less threads than OIO,  but in this test, NIO at least needs 4 threads 
and OIO only needs 2. You can see that the context switching costs a lot in the 
flame graph(ThreadPoolExecutor related operations, EventLoop.execute, 
selector.wakeup, etc.). And the buffer pooling here is also redundant. In OIO, 
one buffer for server and one buffer for client. At last, I think test through 
localhost can make things worse since now the network speed and latency are not 
bottleneck any more.

I plan to test these things next:
1. Read a large block(256MB or more)
2. Simulate the scenario that datanode caches a lot of connections from 
different machine and only a few of them read at the same time.
3. Run all tests on a real cluster(which means read data from other machine).

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8671) Add client support for HTTP/2 stream channels

2015-06-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8671:

Attachment: HDFS-8671-v0.patch

Add client support for {{Http2StreamChannel}}. Introduce a Http2DataReceiver to 
read HTTP/2 response with InputStream.

 Add client support for HTTP/2 stream channels
 -

 Key: HDFS-8671
 URL: https://issues.apache.org/jira/browse/HDFS-8671
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
 Fix For: HDFS-7966

 Attachments: HDFS-8671-v0.patch


 {{Http2StreamChannel}} is introduced in HDFS-8515 but can only be used at 
 server side.
 Now we implement Http2BlockReader using jetty http2-client in the POC branch, 
 but the final version of jetty 9.3.0 only accepts java8.
 So here we plan to extend the functions of {{Http2StreamChannel}} to support 
 client side usage and then implement Http2BlockReader based on it. And we 
 still use jetty http2-client to write testcases to ensure that our http2 
 implementation is valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8671) Add client support for HTTP/2 stream channels

2015-06-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8671:

Status: Patch Available  (was: Open)

 Add client support for HTTP/2 stream channels
 -

 Key: HDFS-8671
 URL: https://issues.apache.org/jira/browse/HDFS-8671
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: HDFS-7966

 Attachments: HDFS-8671-v0.patch


 {{Http2StreamChannel}} is introduced in HDFS-8515 but can only be used at 
 server side.
 Now we implement Http2BlockReader using jetty http2-client in the POC branch, 
 but the final version of jetty 9.3.0 only accepts java8.
 So here we plan to extend the functions of {{Http2StreamChannel}} to support 
 client side usage and then implement Http2BlockReader based on it. And we 
 still use jetty http2-client to write testcases to ensure that our http2 
 implementation is valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-8671) Add client support for HTTP/2 stream channels

2015-06-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reassigned HDFS-8671:
---

Assignee: Duo Zhang

 Add client support for HTTP/2 stream channels
 -

 Key: HDFS-8671
 URL: https://issues.apache.org/jira/browse/HDFS-8671
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: HDFS-7966

 Attachments: HDFS-8671-v0.patch


 {{Http2StreamChannel}} is introduced in HDFS-8515 but can only be used at 
 server side.
 Now we implement Http2BlockReader using jetty http2-client in the POC branch, 
 but the final version of jetty 9.3.0 only accepts java8.
 So here we plan to extend the functions of {{Http2StreamChannel}} to support 
 client side usage and then implement Http2BlockReader based on it. And we 
 still use jetty http2-client to write testcases to ensure that our http2 
 implementation is valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8683) Implement flow control

2015-06-28 Thread Duo Zhang (JIRA)
Duo Zhang created HDFS-8683:
---

 Summary: Implement flow control
 Key: HDFS-8683
 URL: https://issues.apache.org/jira/browse/HDFS-8683
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang


We have implemented read block over http2 on the POC branch.

https://github.com/Apache9/hadoop/tree/HDFS-7966-POC

The block reader is implemented using jetty. We wrote a testcase to test 
performance with MiniCluster and we found that the the flow control of HTTP/2 
has a big impact on the performance. Window update frame will be delayed if we 
create many threads to read, and netty will stop sending data if there is no 
window space left and thus cause a bad impact on performance.

Flow control is a built-in feature of HTTP/2. We ignore it does not mean we can 
bypass it. So I think we need to support this feature at first place. Or at 
least, find a way to bypass it(maybe a very large initial window size?)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Add read block support for DataNode HTTP/2 server

2015-06-27 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Attachment: HDFS-8471-v7.patch

Fix a bug when skipping checksumInput.

 Add read block support for DataNode HTTP/2 server
 -

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: HDFS-7966

 Attachments: HDFS-8471-v6.patch, HDFS-8471-v7.patch, 
 HDFS-8471.1.patch, HDFS-8471.2.patch, HDFS-8471.3.patch, HDFS-8471.4.patch, 
 HDFS-8471.5.patch, HDFS-8471.patch


 Based on the streamed channel introduced in HDFS-8515.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Add read block support for DataNode HTTP/2 server

2015-06-25 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Target Version/s: HDFS-7966
   Fix Version/s: HDFS-7966

 Add read block support for DataNode HTTP/2 server
 -

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Fix For: HDFS-7966

 Attachments: HDFS-8471-v6.patch, HDFS-8471.1.patch, 
 HDFS-8471.2.patch, HDFS-8471.3.patch, HDFS-8471.4.patch, HDFS-8471.5.patch, 
 HDFS-8471.patch


 Based on the streamed channel introduced in HDFS-8515.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8671) Add client support for http2 stream channels

2015-06-25 Thread Duo Zhang (JIRA)
Duo Zhang created HDFS-8671:
---

 Summary: Add client support for http2 stream channels
 Key: HDFS-8671
 URL: https://issues.apache.org/jira/browse/HDFS-8671
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang


{{Http2StreamChannel}} is introduced in HDFS-8515 but can only be used at 
server side.
Now we implement Http2BlockReader using jetty http2-client in the POC branch, 
but the final version of jetty 9.3.0 only accepts java8.

So here we plan to extend the functions of {{Http2StreamChannel}} to support 
client side usage and then implement Http2BlockReader based on it. And we still 
use jetty http2-client to write testcases to ensure that our http2 
implementation is valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8671) Add client support for HTTP/2 stream channels

2015-06-25 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8671:

Target Version/s: HDFS-7966
   Fix Version/s: HDFS-7966
 Summary: Add client support for HTTP/2 stream channels  (was: Add 
client support for http2 stream channels)

 Add client support for HTTP/2 stream channels
 -

 Key: HDFS-8671
 URL: https://issues.apache.org/jira/browse/HDFS-8671
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
 Fix For: HDFS-7966


 {{Http2StreamChannel}} is introduced in HDFS-8515 but can only be used at 
 server side.
 Now we implement Http2BlockReader using jetty http2-client in the POC branch, 
 but the final version of jetty 9.3.0 only accepts java8.
 So here we plan to extend the functions of {{Http2StreamChannel}} to support 
 client side usage and then implement Http2BlockReader based on it. And we 
 still use jetty http2-client to write testcases to ensure that our http2 
 implementation is valid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-24 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8515:

Attachment: HDFS-8515-v5.patch

Remove {{LastMessage}} and {{LastChunkedInput}}, introduce a singleton 
{{Http2LastMessage}} to endStream(with an empty data frame). Remove flow 
control related code. Remove CAS in doBeginRead and doWrite since there are 
always run in the event loop.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, 
 HDFS-8515-v3.patch, HDFS-8515-v4.patch, HDFS-8515-v5.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-24 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8515:

Attachment: HDFS-8515-v6.patch

 Sorry, missing the state changing code when reading.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, 
 HDFS-8515-v3.patch, HDFS-8515-v4.patch, HDFS-8515-v5.patch, 
 HDFS-8515-v6.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Add read block support for DataNode HTTP/2 server

2015-06-24 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Summary: Add read block support for DataNode HTTP/2 server  (was: Implement 
read block over HTTP/2)

 Add read block support for DataNode HTTP/2 server
 -

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.1.patch, HDFS-8471.2.patch, HDFS-8471.3.patch, 
 HDFS-8471.4.patch, HDFS-8471.5.patch, HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Add read block support for DataNode HTTP/2 server

2015-06-24 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Description: Based on the streamed channel introduced in HDFS-8515.

 Add read block support for DataNode HTTP/2 server
 -

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.1.patch, HDFS-8471.2.patch, HDFS-8471.3.patch, 
 HDFS-8471.4.patch, HDFS-8471.5.patch, HDFS-8471.patch


 Based on the streamed channel introduced in HDFS-8515.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Add read block support for DataNode HTTP/2 server

2015-06-24 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Attachment: HDFS-8471-v6.patch

Implement based on the protocol in design doc.

Use a thread pool in ReadBlockHandler to avoid doing time consuming tasks in 
event loop.

I reused the ExceptionHandler in WebHdfs so this patch also contains a little 
modifactions in webhdfs packages.

Thanks.

 Add read block support for DataNode HTTP/2 server
 -

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471-v6.patch, HDFS-8471.1.patch, 
 HDFS-8471.2.patch, HDFS-8471.3.patch, HDFS-8471.4.patch, HDFS-8471.5.patch, 
 HDFS-8471.patch


 Based on the streamed channel introduced in HDFS-8515.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-23 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598634#comment-14598634
 ] 

Duo Zhang commented on HDFS-8515:
-

{quote}
o.a.h.web.http2
{quote}
Then in hadoop-hdfs or hadoop-common? Seems all the codes in hdfs are under 
o.a.h.fs or o.a.h.hdfs...

{quote}
encoder is not thread-safe. It seems to me the right approach is to run the 
write in the event loop of the parent channel. The read path might have the 
same issue.
{quote}
I think there are already in the event loop? Channel.read, Channel.write and 
Channel.flush call the methods in DefaultChannelPipeline, and then call the 
methods in TailContext, there will switch to run in EventLoop.

{quote}
To me both LastChunkedInput and LastMessage look like more of an optimization 
right now. A simpler approach is to send an empty HEADER with the end-of-stream 
bit on to tell the remote peer that the stream has been closed.
{quote}
This is used to notice {{Http2StreamChannel}} we need to send an endStream to 
the remote side, so at least something like a {{LastMessage}} is needed(Think 
of {{LastHttpContent}}). I'd say that sending an endStream with the last data 
frame is an optimization, but I think it is simple enough to implement now?

{quote}
It can be a utility class instead of asking all HTTP2 test cases to inherit it.
{quote}
Any example? And what is the benefit of using an utility class instead of a 
parent class? Thanks.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, 
 HDFS-8515-v3.patch, HDFS-8515-v4.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-23 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598638#comment-14598638
 ] 

Duo Zhang commented on HDFS-8515:
-

{quote}
I'm yet to be convinced that testing of mutli threading is required right now. 
Maybe having some coverage of the basic funciationlities is a higher priority.
{quote}
The basic tests are in {{TestHttp2Server}}. Thanks.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, 
 HDFS-8515-v3.patch, HDFS-8515-v4.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-22 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597011#comment-14597011
 ] 

Duo Zhang commented on HDFS-8515:
-

I've introduced a Http2StreamChannel on the POC branch.

https://github.com/Apache9/hadoop/tree/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/http2

Let me extract a patch for it, thanks.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, 
 HDFS-8515-v3.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-22 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8515:

Attachment: HDFS-8515-v4.patch

A solution based on AbstractChannel.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, 
 HDFS-8515-v3.patch, HDFS-8515-v4.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-06-18 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591434#comment-14591434
 ] 

Duo Zhang commented on HDFS-7966:
-

https://github.com/Apache9/hadoop/tree/HDFS-7966-POC

Will implement read block at this branch and collect some performance results 
first.

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-17 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8515:

Attachment: HDFS-8515-v2.patch

A new patch based on the design doc. Will add more comments and find a standard 
client to test the server. Thanks.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-17 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8515:

Attachment: HDFS-8515-v3.patch

Add a testcase using jetty's HTTP2Client.

Send 1 requests concurrently on one connection with 20 threads, and reset 
about 20% of the streams.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515-v2.patch, 
 HDFS-8515-v3.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-14 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8515:

Attachment: HDFS-8515-v1.patch

Introduce an EmbeddedStream which is similar to netty's EmbeddedChannel but 
shares the same EventLoop of Channel.

TestDtpHttp2 is the basic testcase, will add more tests if you think this is 
right approach [~wheat9].

Thanks.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515-v1.patch, HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8515:

Attachment: HDFS-8515.patch

A patch for review. Will add more testcases later.

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-08 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8515:

Status: Patch Available  (was: Open)

 Abstract a DTP/2 HTTP/2 server
 --

 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8515.patch


 Discussed in HDFS-8471.
 https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8471) Implement read block over HTTP/2

2015-06-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568430#comment-14568430
 ] 

Duo Zhang commented on HDFS-8471:
-

Fine. But I'd say this abstraction is only suitable for DTP, not for general 
HTTP/2 streams. And if we want to do the abstraction at a separated issue, I 
think we'd better abstract for both server and client. So I do not think the 
code should be placed in datanode package, maybe 
org.apache.hadoop.hdfs.protocol.datatransfer.http2?

The new DTP protocol of reading block is
{noformat}
 request headers
  Client -- DataNode
 request proto
  Client -- DataNode
response headers
  Client -- DataNode
response proto
  Client -- DataNode
--
|   packet header |
| Client -- DataNode |
|   packet data   |  multiple times
| Client -- DataNode |
|-|
{noformat}

So for DTP, I think we only need to implement the inbound channel handler logic 
because we need to decode protobuf message and modify the channel pipeline. For 
outbound, we do not need to modify the channel pipeline, and I think encode a 
protobuf message is not that hard(just use writeDelimitedTo).

Thanks. [~wheat9]

 Implement read block over HTTP/2
 

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.1.patch, HDFS-8471.2.patch, HDFS-8471.3.patch, 
 HDFS-8471.4.patch, HDFS-8471.5.patch, HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8515) Abstract a DTP/2 HTTP/2 server

2015-06-01 Thread Duo Zhang (JIRA)
Duo Zhang created HDFS-8515:
---

 Summary: Abstract a DTP/2 HTTP/2 server
 Key: HDFS-8515
 URL: https://issues.apache.org/jira/browse/HDFS-8515
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Duo Zhang
Assignee: Duo Zhang


Discussed in HDFS-8471.

https://issues.apache.org/jira/browse/HDFS-8471?focusedCommentId=14568196page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14568196



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8471) Implement read block over HTTP/2

2015-05-31 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566542#comment-14566542
 ] 

Duo Zhang commented on HDFS-8471:
-

When digging into netty, I found that if flow controll enabled, netty may split 
your data frame to multiple data frames when sending.
And I think it is reasonable that we should not depending on the transport 
protocol to explain the actual data we sent.

Will prepare new patch soon.

Thanks.

 Implement read block over HTTP/2
 

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.1.patch, HDFS-8471.2.patch, HDFS-8471.3.patch, 
 HDFS-8471.4.patch, HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Implement read block over HTTP/2

2015-05-31 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Attachment: HDFS-8471.5.patch

Introduce a decoder in front of the original handler. Change all protobuf 
message to use delimited encoding. Add a dataLength field in block data frame 
header.

 Implement read block over HTTP/2
 

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.1.patch, HDFS-8471.2.patch, HDFS-8471.3.patch, 
 HDFS-8471.4.patch, HDFS-8471.5.patch, HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Implement read block over HTTP/2

2015-05-30 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Attachment: HDFS-8471.4.patch

Do not create new promise in DtpHttp2Handler.

 Implement read block over HTTP/2
 

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.1.patch, HDFS-8471.2.patch, HDFS-8471.3.patch, 
 HDFS-8471.4.patch, HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Implement read block over HTTP/2

2015-05-30 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Attachment: HDFS-8471.2.patch

Hide stream id when calling ReadBlockHandler. 

https://github.com/netty/netty/issues/3667 has not been finished yet, so we 
need to hack it by ourselves.

 Implement read block over HTTP/2
 

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.1.patch, HDFS-8471.2.patch, HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Implement read block over HTTP/2

2015-05-30 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Attachment: HDFS-8471.3.patch

Add SuppressWarnings to remove the javac warnings. Clean up some unused imports.

 Implement read block over HTTP/2
 

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.1.patch, HDFS-8471.2.patch, HDFS-8471.3.patch, 
 HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Implement read block over HTTP/2

2015-05-29 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Attachment: HDFS-8471.1.patch

Add checksum support. Introduce a ReadBlockHandler. Add a testcase to test 
block not exists error.

 Implement read block over HTTP/2
 

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.1.patch, HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8471) Implement read block over HTTP/2

2015-05-28 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14562373#comment-14562373
 ] 

Duo Zhang commented on HDFS-8471:
-

{quote}
It might make sense to move some of the code out of DtpHttp2FrameListener for a 
cleaner separation between the transport layer and the application layer.
{quote}
OK, will do it.

{quote}
How the exception will be handled? It might make sense to send back a protobuf 
to match the spec of the grpc.
{quote}
Methods in Http2FrameListener are only allowed to throw Http2Exception and 
Http2ConnectionHandler will handle it(reset stream or close connection). By 
design I will catch all other exceptions and write back a proto. I think I 
should add a testcase for exception handling.

Thanks.

 Implement read block over HTTP/2
 

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8471) Implement read block over HTTP/2

2015-05-27 Thread Duo Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HDFS-8471:

Status: Patch Available  (was: Open)

 Implement read block over HTTP/2
 

 Key: HDFS-8471
 URL: https://issues.apache.org/jira/browse/HDFS-8471
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Duo Zhang
Assignee: Duo Zhang
 Attachments: HDFS-8471.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >