[jira] [Commented] (HDFS-14809) Reduce BlockReaderLocal RPC calls
[ https://issues.apache.org/jira/browse/HDFS-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059627#comment-17059627 ] Lisheng Sun commented on HDFS-14809: [~ken_1...@163.com] {quote} As far as i know the shm and slot is used to record reference count by client, and read by DN to determin whether a replica can be uncached by CacheManager. {quote} The shm and slot also provides a role for the validity of the synchronized replica. there are 2 question: 1. How does the client know if the replica is still valid in your implementation? 2. a lot dfsinputstreams of the dfsclient SSR is for one replica, rpc is needed each dfsinputstream every time. > Reduce BlockReaderLocal RPC calls > - > > Key: HDFS-14809 > URL: https://issues.apache.org/jira/browse/HDFS-14809 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: KenCao >Assignee: KenCao >Priority: Major > Attachments: HADOOP-14809 > > > as we known, the hdfs client java lib uses BlockReaderLocal for short circuit > read by default, which allocate shared memory first, and make a slot within > it. After all these steps, it will request the fds from the DataNode. > However, the slot and shared memory sturcture is only used by DataNode when > uncaching replicas, the client process can work well just with the fds asked > later and it is nearly impossible to cache replicas in product environment. > The api to release fds is called by client only with the slot given, the fds > is close in the client process finally. > so i think we can make a new BlockReader implementation which just requests > the fds, and it will reduce the rpc calls from 3(allocate shm, request fds, > release fds) to 1(request fds). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14809) Reduce BlockReaderLocal RPC calls
[ https://issues.apache.org/jira/browse/HDFS-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058013#comment-17058013 ] KenCao commented on HDFS-14809: --- Hello [~leosun08] . I think, in theory, it is possible to just request the fd from DN and execute your read logic.And in my implementation, the client will use others readers if it failed to get BlockReaderLocal2, which will work as before. As far as i know the shm and slot is used to record reference count by client, and read by DN to determin whether a replica can be uncached by CacheManager. However the CacheManager is never used in my situation, and alluxio may be a better choice. > Reduce BlockReaderLocal RPC calls > - > > Key: HDFS-14809 > URL: https://issues.apache.org/jira/browse/HDFS-14809 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: KenCao >Assignee: KenCao >Priority: Major > Attachments: HADOOP-14809 > > > as we known, the hdfs client java lib uses BlockReaderLocal for short circuit > read by default, which allocate shared memory first, and make a slot within > it. After all these steps, it will request the fds from the DataNode. > However, the slot and shared memory sturcture is only used by DataNode when > uncaching replicas, the client process can work well just with the fds asked > later and it is nearly impossible to cache replicas in product environment. > The api to release fds is called by client only with the slot given, the fds > is close in the client process finally. > so i think we can make a new BlockReader implementation which just requests > the fds, and it will reduce the rpc calls from 3(allocate shm, request fds, > release fds) to 1(request fds). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14809) Reduce BlockReaderLocal RPC calls
[ https://issues.apache.org/jira/browse/HDFS-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17054313#comment-17054313 ] Lisheng Sun commented on HDFS-14809: Thank [~ken_1...@163.com] for reporting this jira. I don't quite understand some places in this jira description. {quote} However, the slot and shared memory sturcture is only used by DataNode when uncaching replicas, {quote} you mean the client don't use the slot and shared memory sturcture when uncaching replicas. How does the client know if the replica is invalid at this time? The slot's effect is effectiveness of synchronizing replicas and number of replica ref on client and DN. According to your implementation only requests the fds , request an rpc fds before each short circuit read. Please correct me if i was wrong. Regarding the performance of short-circuit read, you can pay attention to this jira HDFS-13639 and HDFS-13564 > Reduce BlockReaderLocal RPC calls > - > > Key: HDFS-14809 > URL: https://issues.apache.org/jira/browse/HDFS-14809 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: KenCao >Assignee: KenCao >Priority: Major > Attachments: HADOOP-14809 > > > as we known, the hdfs client java lib uses BlockReaderLocal for short circuit > read by default, which allocate shared memory first, and make a slot within > it. After all these steps, it will request the fds from the DataNode. > However, the slot and shared memory sturcture is only used by DataNode when > uncaching replicas, the client process can work well just with the fds asked > later and it is nearly impossible to cache replicas in product environment. > The api to release fds is called by client only with the slot given, the fds > is close in the client process finally. > so i think we can make a new BlockReader implementation which just requests > the fds, and it will reduce the rpc calls from 3(allocate shm, request fds, > release fds) to 1(request fds). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14809) Reduce BlockReaderLocal RPC calls
[ https://issues.apache.org/jira/browse/HDFS-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050525#comment-17050525 ] Hadoop QA commented on HDFS-14809: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} patch {color} | {color:blue} 0m 6s{color} | {color:blue} The patch file was not named according to hadoop's naming conventions. Please see https://wiki.apache.org/hadoop/HowToContribute for instructions. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 9s{color} | {color:red} HDFS-14809 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14809 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979848/HADOOP-14809 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28890/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Reduce BlockReaderLocal RPC calls > - > > Key: HDFS-14809 > URL: https://issues.apache.org/jira/browse/HDFS-14809 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: KenCao >Assignee: kencao >Priority: Major > Attachments: HADOOP-14809 > > > as we known, the hdfs client java lib uses BlockReaderLocal for short circuit > read by default, which allocate shared memory first, and make a slot within > it. After all these steps, it will request the fds from the DataNode. > However, the slot and shared memory sturcture is only used by DataNode when > uncaching replicas, the client process can work well just with the fds asked > later and it is nearly impossible to cache replicas in product environment. > The api to release fds is called by client only with the slot given, the fds > is close in the client process finally. > so i think we can make a new BlockReader implementation which just requests > the fds, and it will reduce the rpc calls from 3(allocate shm, request fds, > release fds) to 1(request fds). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14809) Reduce BlockReaderLocal RPC calls
[ https://issues.apache.org/jira/browse/HDFS-14809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050521#comment-17050521 ] Wei-Chiu Chuang commented on HDFS-14809: Updated the summary based on the suggestion. I'm extremely sorry for missing out this one. [~leosun08] [~openinx] does this make sense to you? > Reduce BlockReaderLocal RPC calls > - > > Key: HDFS-14809 > URL: https://issues.apache.org/jira/browse/HDFS-14809 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: KenCao >Assignee: kencao >Priority: Major > Attachments: HADOOP-14809 > > > as we known, the hdfs client java lib uses BlockReaderLocal for short circuit > read by default, which allocate shared memory first, and make a slot within > it. After all these steps, it will request the fds from the DataNode. > However, the slot and shared memory sturcture is only used by DataNode when > uncaching replicas, the client process can work well just with the fds asked > later and it is nearly impossible to cache replicas in product environment. > The api to release fds is called by client only with the slot given, the fds > is close in the client process finally. > so i think we can make a new BlockReader implementation which just requests > the fds, and it will reduce the rpc calls from 3(allocate shm, request fds, > release fds) to 1(request fds). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org