[jira] [Created] (HDFS-16770) [Documentation] RBF: Duplicate statement to be removed for better readabilty
Renukaprasad C created HDFS-16770: - Summary: [Documentation] RBF: Duplicate statement to be removed for better readabilty Key: HDFS-16770 URL: https://issues.apache.org/jira/browse/HDFS-16770 Project: Hadoop HDFS Issue Type: Improvement Reporter: Renukaprasad C Assignee: Renukaprasad C Both the below 2 statements gives the same meaning, later one can be removed. The Router monitors the local NameNode and its state and heartbeats to the State Store. The Router monitors the local NameNode and heartbeats the state to the State Store. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15067) Optimize heartbeat for large cluster
[ https://issues.apache.org/jira/browse/HDFS-15067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17567819#comment-17567819 ] Renukaprasad C commented on HDFS-15067: --- Thanks [~surendralilhore] for reporting the issue and the patch. Thanks [~ayushtkn] [~umamaheswararao] for review & feedback. This optimization is running in our large clusters for long time and no related issues reported. Patch shall be consider for merge? Other improvements we shall take it further. > Optimize heartbeat for large cluster > > > Key: HDFS-15067 > URL: https://issues.apache.org/jira/browse/HDFS-15067 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Attachments: HDFS-15067.01.patch, HDFS-15067.02.patch, > HDFS-15067.03.patch, image-2020-01-09-18-00-49-556.png > > > In a large cluster Namenode spend some time in processing heartbeats. For > example, in 10K node cluster namenode process 10K RPC's for heartbeat in each > 3sec. This will impact the client response time. This heart beat can be > optimized. DN can start skipping one heart beat if no > work(Write/replication/Delete) is allocated from long time. DN can start > sending heart beat in 6 sec. Once the DN stating getting work from NN , it > can start sending heart beat normally. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16580) Datanode to print the blockID while releasing the SCFds
[ https://issues.apache.org/jira/browse/HDFS-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C reassigned HDFS-16580: - Assignee: (was: Renukaprasad C) > Datanode to print the blockID while releasing the SCFds > --- > > Key: HDFS-16580 > URL: https://issues.apache.org/jira/browse/HDFS-16580 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.1.4, 3.4.0, 3.3.2 >Reporter: Renukaprasad C >Priority: Major > > Method - > org.apache.hadoop.hdfs.server.datanode.DataXceiver#requestShortCircuitFds > prints the block ID entered for short circuit read, but corresponding entry > missing in > org.apache.hadoop.hdfs.server.datanode.DataXceiver#releaseShortCircuitFds. > Its good to have corresponding blockID in release method as well. > We are facing some random file read issues when SCR enabled. It From the > current logs, we cannot map request & release flows. > It will be helpful in debugging issues if we log blockID in release methods > as well. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16580) Datanode to print the blockID while releasing the SCFds
Renukaprasad C created HDFS-16580: - Summary: Datanode to print the blockID while releasing the SCFds Key: HDFS-16580 URL: https://issues.apache.org/jira/browse/HDFS-16580 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.3.2, 3.1.4, 3.4.0 Reporter: Renukaprasad C Assignee: Renukaprasad C Method - org.apache.hadoop.hdfs.server.datanode.DataXceiver#requestShortCircuitFds prints the block ID entered for short circuit read, but corresponding entry missing in org.apache.hadoop.hdfs.server.datanode.DataXceiver#releaseShortCircuitFds. Its good to have corresponding blockID in release method as well. We are facing some random file read issues when SCR enabled. It From the current logs, we cannot map request & release flows. It will be helpful in debugging issues if we log blockID in release methods as well. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16563) Namenode WebUI prints sensitve information on Token Expiry
[ https://issues.apache.org/jira/browse/HDFS-16563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16563: -- Attachment: image-2022-04-27-23-28-40-568.png > Namenode WebUI prints sensitve information on Token Expiry > -- > > Key: HDFS-16563 > URL: https://issues.apache.org/jira/browse/HDFS-16563 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Attachments: image-2022-04-27-23-01-16-033.png, > image-2022-04-27-23-28-40-568.png > > Time Spent: 10m > Remaining Estimate: 0h > > Login to Namenode WebUI. > Wait for token to expire. (Or modify the Token refresh time > dfs.namenode.delegation.token.renew/update-interval to lower value) > Refresh the WebUI after the Token expiry. > Full token information gets printed in WebUI. > > !image-2022-04-27-23-01-16-033.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16563) Namenode WebUI prints sensitve information on Token Expiry
[ https://issues.apache.org/jira/browse/HDFS-16563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528943#comment-17528943 ] Renukaprasad C commented on HDFS-16563: --- Changes verified in cluster: !image-2022-04-27-23-28-40-568.png! > Namenode WebUI prints sensitve information on Token Expiry > -- > > Key: HDFS-16563 > URL: https://issues.apache.org/jira/browse/HDFS-16563 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Attachments: image-2022-04-27-23-01-16-033.png, > image-2022-04-27-23-28-40-568.png > > Time Spent: 10m > Remaining Estimate: 0h > > Login to Namenode WebUI. > Wait for token to expire. (Or modify the Token refresh time > dfs.namenode.delegation.token.renew/update-interval to lower value) > Refresh the WebUI after the Token expiry. > Full token information gets printed in WebUI. > > !image-2022-04-27-23-01-16-033.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16563) Namenode WebUI prints sensitve information on Token Expiry
[ https://issues.apache.org/jira/browse/HDFS-16563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C reassigned HDFS-16563: - Assignee: Renukaprasad C > Namenode WebUI prints sensitve information on Token Expiry > -- > > Key: HDFS-16563 > URL: https://issues.apache.org/jira/browse/HDFS-16563 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Attachments: image-2022-04-27-23-01-16-033.png > > > Login to Namenode WebUI. > Wait for token to expire. (Or modify the Token refresh time > dfs.namenode.delegation.token.renew/update-interval to lower value) > Refresh the WebUI after the Token expiry. > Full token information gets printed in WebUI. > > !image-2022-04-27-23-01-16-033.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16563) Namenode WebUI prints sensitve information on Token Expiry
Renukaprasad C created HDFS-16563: - Summary: Namenode WebUI prints sensitve information on Token Expiry Key: HDFS-16563 URL: https://issues.apache.org/jira/browse/HDFS-16563 Project: Hadoop HDFS Issue Type: Bug Reporter: Renukaprasad C Attachments: image-2022-04-27-23-01-16-033.png Login to Namenode WebUI. Wait for token to expire. (Or modify the Token refresh time dfs.namenode.delegation.token.renew/update-interval to lower value) Refresh the WebUI after the Token expiry. Full token information gets printed in WebUI. !image-2022-04-27-23-01-16-033.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16093) DataNodes under decommission will still be returned to the client via getLocatedBlocks, so the client may request decommissioning datanodes to read which will cause bad
[ https://issues.apache.org/jira/browse/HDFS-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528889#comment-17528889 ] Renukaprasad C commented on HDFS-16093: --- [~Daniel Ma] Thanks for reporting the issue. Thanks [~hexiaoqiao] [~sodonnell] [~tomscut] for review & feedback. I do agree with [~hexiaoqiao] / [~tomscut] / [~sodonnell] , instead of excluding the Decommissioning (Decommissioned), this can be placed last. And read will be success with other normal DNs. Are you still working on this solution? > DataNodes under decommission will still be returned to the client via > getLocatedBlocks, so the client may request decommissioning datanodes to read > which will cause badly competation on disk IO. > -- > > Key: HDFS-16093 > URL: https://issues.apache.org/jira/browse/HDFS-16093 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Assignee: Daniel Ma >Priority: Critical > > DataNodes under decommission will still be returned to the client via > getLocatedBlocks, so the client may request decommissioning datanodes to read > which will cause badly competation on disk IO. > Therefore, datanodes under decommission should be removed from the return > list of getLocatedBlocks api. > !image-2021-06-29-10-50-44-739.png! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15134) Any write calls with REST API on Standby NN print error message with wrong online help URL
[ https://issues.apache.org/jira/browse/HDFS-15134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528881#comment-17528881 ] Renukaprasad C commented on HDFS-15134: --- Thanks [~Sushma_28] for the patch. LGTM [~Hemanth Boyina] [~hexiaoqiao] Can you plz take a look into it? > Any write calls with REST API on Standby NN print error message with wrong > online help URL > -- > > Key: HDFS-15134 > URL: https://issues.apache.org/jira/browse/HDFS-15134 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-15134.001.patch > > > vm2:/opt# curl -k -i --negotiate -u : > "http://IP:PORT/webhdfs/v1/test?op=MKDIRS; > HTTP/1.1 403 Forbidden > Date: Mon, 20 Jan 2020 07:28:19 GMT > Cache-Control: no-cache > Expires: Mon, 20 Jan 2020 07:28:20 GMT > Date: Mon, 20 Jan 2020 07:28:20 GMT > Pragma: no-cache > X-FRAME-OPTIONS: SAMEORIGIN > Content-Type: application/json > Transfer-Encoding: chunked > {"RemoteException":{"exception":"StandbyException","javaClassName":"org.apache.hadoop.ipc.StandbyException","message":"Operation > category WRITE is not supported in state standby. Visit > https://s.apache.org/sbnn-error"}} > Invalid link doesnt exists - https://s.apache.org/sbnn-error. This need to be > updated. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16094) HDFS balancer process start failed owing to daemon pid file is not cleared in some exception senario
[ https://issues.apache.org/jira/browse/HDFS-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528382#comment-17528382 ] Renukaprasad C commented on HDFS-16094: --- Similar issue HDFS-15932 has been addressed. [~Daniel Ma] please check if any other information to be added. > HDFS balancer process start failed owing to daemon pid file is not cleared in > some exception senario > > > Key: HDFS-16094 > URL: https://issues.apache.org/jira/browse/HDFS-16094 > Project: Hadoop HDFS > Issue Type: Improvement > Components: scripts >Affects Versions: 3.3.1 >Reporter: Daniel Ma >Priority: Major > > HDFS balancer process start failed owing to daemon pid file is not cleared in > some exception senario, but there is no useful information in log to trouble > shoot as below. > {code:java} > //代码占位符 > hadoop_error "${daemonname} is running as process $(cat "${daemon_pidfile}") > {code} > but actually, the process is not running as the error msg details above. > Therefore, some more explicit information should be print in error log to > guide users to clear the pid file and where the pid file location is. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16551) Backport HADOOP-17588 to 3.3 and other active old branches.
[ https://issues.apache.org/jira/browse/HDFS-16551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527906#comment-17527906 ] Renukaprasad C commented on HDFS-16551: --- Thanks [~ste...@apache.org] & [~weichiu] for review & merge. > Backport HADOOP-17588 to 3.3 and other active old branches. > --- > > Key: HDFS-16551 > URL: https://issues.apache.org/jira/browse/HDFS-16551 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Fix For: 2.10.2, 3.2.4, 3.3.4 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This random issue has been handled in trunk, same needs to be backported to > active branches. > org.apache.hadoop.crypto.CryptoInputStream.close() - when 2 threads try to > close the stream second thread, fails with error. > This operation should be synchronized to avoid multiple threads to perform > the close operation concurrently. > [~Hemanth Boyina] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15741) Vulnerability fixes needed for Jackson Hadoop dependency library
[ https://issues.apache.org/jira/browse/HDFS-15741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C resolved HDFS-15741. --- Resolution: Duplicate Upgraded as part of HADOOP-17534, this is no more valid. [~weichiu] [~SouryakantaDwivedy] > Vulnerability fixes needed for Jackson Hadoop dependency library > - > > Key: HDFS-15741 > URL: https://issues.apache.org/jira/browse/HDFS-15741 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 3.1.1 >Reporter: Souryakanta Dwivedy >Priority: Minor > Attachments: CVEs_found.png > > > Vulnerability fixes need for Jackson Hadoop dependency library > Below are the Jackson library jars used for hadoop where CVEs are found > Jackson [version 2.10.3 ] > - jackson-core-2.10.3.jar > CVE details :- [ CVE-2020-25649 ] > == > Jackson-core [version 2.4.0 ] > - htrace-core-3.1.0-incubating.jar > CVE details :- [ CVE-2020-24616 ] > = > > > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16551) Backport HADOOP-17588 to 3.3 and other active old branches.
[ https://issues.apache.org/jira/browse/HDFS-16551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526876#comment-17526876 ] Renukaprasad C commented on HDFS-16551: --- Thanks [~ste...@apache.org] for the quick review. I had raised PR for branch-3.2. Do I need to raise separate PR for branch-2.10 also? > Backport HADOOP-17588 to 3.3 and other active old branches. > --- > > Key: HDFS-16551 > URL: https://issues.apache.org/jira/browse/HDFS-16551 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Fix For: 3.3.4 > > Time Spent: 50m > Remaining Estimate: 0h > > This random issue has been handled in trunk, same needs to be backported to > active branches. > org.apache.hadoop.crypto.CryptoInputStream.close() - when 2 threads try to > close the stream second thread, fails with error. > This operation should be synchronized to avoid multiple threads to perform > the close operation concurrently. > [~Hemanth Boyina] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16551) Backport HADOOP-17588 to 3.3 and other active old branches.
[ https://issues.apache.org/jira/browse/HDFS-16551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525762#comment-17525762 ] Renukaprasad C commented on HDFS-16551: --- Thanks [~ste...@apache.org] , i have raised PR for branch-3.3. > Backport HADOOP-17588 to 3.3 and other active old branches. > --- > > Key: HDFS-16551 > URL: https://issues.apache.org/jira/browse/HDFS-16551 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This random issue has been handled in trunk, same needs to be backported to > active branches. > org.apache.hadoop.crypto.CryptoInputStream.close() - when 2 threads try to > close the stream second thread, fails with error. > This operation should be synchronized to avoid multiple threads to perform > the close operation concurrently. > [~Hemanth Boyina] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16551) Backport HADOOP-17588 to 3.3 and other active old branches.
Renukaprasad C created HDFS-16551: - Summary: Backport HADOOP-17588 to 3.3 and other active old branches. Key: HDFS-16551 URL: https://issues.apache.org/jira/browse/HDFS-16551 Project: Hadoop HDFS Issue Type: Task Reporter: Renukaprasad C Assignee: Renukaprasad C This random issue has been handled in trunk, same needs to be backported to active branches. org.apache.hadoop.crypto.CryptoInputStream.close() - when 2 threads try to close the stream second thread, fails with error. This operation should be synchronized to avoid multiple threads to perform the close operation concurrently. [~Hemanth Boyina] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16545) Provide option to balance rack level in Balancer
[ https://issues.apache.org/jira/browse/HDFS-16545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523806#comment-17523806 ] Renukaprasad C commented on HDFS-16545: --- Thanks [~liuml07] for the quick review & feedback. Once rack based balancer executed, it will balance the blocks with-in the rack. These DNs (balanced with rack based balancer) wont participate in the global balancer (without rack option / cluster-wide). "Also do we plan to allow multiple rack-wide balancers (different racks)?" So far we considered single rack balance. Need to analyze further to support multiple racks. > Provide option to balance rack level in Balancer > > > Key: HDFS-16545 > URL: https://issues.apache.org/jira/browse/HDFS-16545 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > > Currently Balancer tool run on entire cluster and balance across the racks. > In we need to balance within a rack, then need to provide an option to > support the rack level balancing. > [~surendralilhore] [~hemanthboyina] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16545) Provide option to balance rack level in Balancer
Renukaprasad C created HDFS-16545: - Summary: Provide option to balance rack level in Balancer Key: HDFS-16545 URL: https://issues.apache.org/jira/browse/HDFS-16545 Project: Hadoop HDFS Issue Type: Improvement Reporter: Renukaprasad C Assignee: Renukaprasad C Currently Balancer tool run on entire cluster and balance across the racks. In we need to balance within a rack, then need to provide an option to support the rack level balancing. [~surendralilhore] [~hemanthboyina] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16526) Add metrics for slow DataNode
Renukaprasad C created HDFS-16526: - Summary: Add metrics for slow DataNode Key: HDFS-16526 URL: https://issues.apache.org/jira/browse/HDFS-16526 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Renukaprasad C Assignee: Renukaprasad C Add some more metrics for slow datanode operations - FlushOrSync, PacketResponder send ACK. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16428) Source path with storagePolicy cause wrong typeConsumed while rename
[ https://issues.apache.org/jira/browse/HDFS-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499930#comment-17499930 ] Renukaprasad C commented on HDFS-16428: --- [~lei w] Thanks for your contribution. [~hexiaoqiao] Thanks for the review, same can be merged to other branches as well right? > Source path with storagePolicy cause wrong typeConsumed while rename > > > Key: HDFS-16428 > URL: https://issues.apache.org/jira/browse/HDFS-16428 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: example.txt > > Time Spent: 2.5h > Remaining Estimate: 0h > > When compute quota in rename operation , we use storage policy of the target > directory to compute src quota usage. This will cause wrong value of > typeConsumed when source path was setted storage policy. I provided a unit > test to present this situation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16239) XAttr#toString doesnt print the attribute value in readable format
[ https://issues.apache.org/jira/browse/HDFS-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C resolved HDFS-16239. --- Resolution: Invalid To print, have we considered using XattrCodec APIs. Its not neccessary to print the XAttr. > XAttr#toString doesnt print the attribute value in readable format > -- > > Key: HDFS-16239 > URL: https://issues.apache.org/jira/browse/HDFS-16239 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > org.apache.hadoop.fs.XAttr#toString prints the value of attribute in bytes. > return "XAttr [ns=" + ns + ", name=" + name + ", value=" > + Arrays.toString(value) + "]"; > XAttr [ns=SYSTEM, name=az.expression, value=[82, 69, 80, 91, 50, 93..] > This should be converted to String rather than printing to Array of bytes. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422037#comment-17422037 ] Renukaprasad C commented on HDFS-14575: --- Thanks [~weichiu] for bring it up. [~weichiu] / [~hexiaoqiao] Yes, It should be fine to merge into both the branches -3.3/3.2. How it will be handled? Merged as part of same Jira or separate MR needs to be raised? > LeaseRenewer#daemon threads leak in DFSClient > - > > Key: HDFS-14575 > URL: https://issues.apache.org/jira/browse/HDFS-14575 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tao Yang >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch, > HDFS-14575.003.patch, HDFS-14575.004.patch > > > Currently LeaseRenewer (and its daemon thread) without clients should be > terminated after a grace period which defaults to 60 seconds. A race > condition may happen when a new request is coming just after LeaseRenewer > expired. > Reproduce this race condition: > # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 > thread, after a few seconds, File#1 is closed , there is no clients in > LeaseRenewer#1 now. > # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 > thread is still in sleep, Client#1 creates File#2, lead to the creation of > Daemon#2. > # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from > factory. > # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it > can’t get renewer from factory. > Daemon#2 thread leaks from now on, since Client#1 in it can never be removed > and it won't have a chance to stop. > To solve this problem, IIUIC, a simple way I think is to make sure that all > clients are cleared when LeaseRenewer is removed from factory. Please feel > free to give your suggestions. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16243) The available disk space is less than the reserved space, and no log message is displayed
[ https://issues.apache.org/jira/browse/HDFS-16243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421924#comment-17421924 ] Renukaprasad C commented on HDFS-16243: --- Thanks [~zhttylz] for the issue & the patch. LOG.warn("Configured reserved space is higher than Disk capacity"); - Here can you print values as well. I think you created patch for the specific version. Is it applicable to trunk as well? Also, you can raise a PR, which will be easier for review & trace. > The available disk space is less than the reserved space, and no log message > is displayed > - > > Key: HDFS-16243 > URL: https://issues.apache.org/jira/browse/HDFS-16243 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hualong Zhang >Priority: Major > Attachments: HDFS-16243.patch > > > When I submitted a task to the hadoop test cluster, it appeared "could only > be replicated to 0 nodes instead of minReplication (=1)" > I checked the namenode and datanode logs and did not find any error logs. It > was not until the use of dfsadmin -report that the available capacity was 0 > and I realized that it may be a configuration problem. > Checking the configuration found that the value of the > "dfs.datanode.du.reserved" configuration is greater than the available disk > space of HDFS, which caused this problem > It seems that there should be some warnings or errors in the log. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15893) Logs are flooded when dfs.ha.tail-edits.in-progress set to true or dfs.ha.tail-edits.period to 0ms
[ https://issues.apache.org/jira/browse/HDFS-15893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420971#comment-17420971 ] Renukaprasad C commented on HDFS-15893: --- Thanks [~Sushma_28] for reporting the issue and detailed clarification. Are you working on this patch further? Thanks [~jianghuazhu] for the quick review & update. > Logs are flooded when dfs.ha.tail-edits.in-progress set to true or > dfs.ha.tail-edits.period to 0ms > -- > > Key: HDFS-15893 > URL: https://issues.apache.org/jira/browse/HDFS-15893 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Assignee: Ravuri Sushma sree >Priority: Major > Attachments: HDFS-15893.001.patch > > > When we set dfs.ha.tail-edits.in-progress to true, dfs.ha.tail-edits.period > to 0ms almost all the logs on standby and observer NN are loaded. Such logs > will flood useful logs. > We can adjust the log level of few logs to debug while observer node is in > operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16239) XAttr#toString doesnt print the attribute value in readable format
[ https://issues.apache.org/jira/browse/HDFS-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16239 started by Renukaprasad C. - > XAttr#toString doesnt print the attribute value in readable format > -- > > Key: HDFS-16239 > URL: https://issues.apache.org/jira/browse/HDFS-16239 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.hadoop.fs.XAttr#toString prints the value of attribute in bytes. > return "XAttr [ns=" + ns + ", name=" + name + ", value=" > + Arrays.toString(value) + "]"; > XAttr [ns=SYSTEM, name=az.expression, value=[82, 69, 80, 91, 50, 93..] > This should be converted to String rather than printing to Array of bytes. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16239) XAttr#toString doesnt print the attribute value in readable format
[ https://issues.apache.org/jira/browse/HDFS-16239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16239: -- Summary: XAttr#toString doesnt print the attribute value in readable format (was: XAttr#toString doesnt print the value) > XAttr#toString doesnt print the attribute value in readable format > -- > > Key: HDFS-16239 > URL: https://issues.apache.org/jira/browse/HDFS-16239 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > > org.apache.hadoop.fs.XAttr#toString prints the value of attribute in bytes. > return "XAttr [ns=" + ns + ", name=" + name + ", value=" > + Arrays.toString(value) + "]"; > XAttr [ns=SYSTEM, name=az.expression, value=[82, 69, 80, 91, 50, 93..] > This should be converted to String rather than printing to Array of bytes. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16239) XAttr#toString doesnt print the value
Renukaprasad C created HDFS-16239: - Summary: XAttr#toString doesnt print the value Key: HDFS-16239 URL: https://issues.apache.org/jira/browse/HDFS-16239 Project: Hadoop HDFS Issue Type: Bug Reporter: Renukaprasad C Assignee: Renukaprasad C org.apache.hadoop.fs.XAttr#toString prints the value of attribute in bytes. return "XAttr [ns=" + ns + ", name=" + name + ", value=" + Arrays.toString(value) + "]"; XAttr [ns=SYSTEM, name=az.expression, value=[82, 69, 80, 91, 50, 93..] This should be converted to String rather than printing to Array of bytes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16235) Deadlock in LeaseRenewer for static remove method
[ https://issues.apache.org/jira/browse/HDFS-16235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419920#comment-17419920 ] Renukaprasad C commented on HDFS-16235: --- Thanks [~angerszhuuu] for the clarification. Its clear & PR changes are fine. PR LGTM. [~ferhui] Thanks for the reviw, we shall merge the PR. > Deadlock in LeaseRenewer for static remove method > - > > Key: HDFS-16235 > URL: https://issues.apache.org/jira/browse/HDFS-16235 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Attachments: HDFS-16235.001.patch, image-2021-09-23-19-31-57-337.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > !image-2021-09-23-19-31-57-337.png|width=3339,height=1936! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16235) Deadlock in LeaseRenewer for static remove method
[ https://issues.apache.org/jira/browse/HDFS-16235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419445#comment-17419445 ] Renukaprasad C commented on HDFS-16235: --- Good catch [~angerszhu] & Thanks for reporting the issue & the fix. I would like to see the problem. Do you have test case / scenario to reproduce the issue? > Deadlock in LeaseRenewer for static remove method > - > > Key: HDFS-16235 > URL: https://issues.apache.org/jira/browse/HDFS-16235 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Attachments: HDFS-16235.001.patch, image-2021-09-23-19-31-57-337.png > > Time Spent: 20m > Remaining Estimate: 0h > > !image-2021-09-23-19-31-57-337.png|width=3339,height=1936! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16236) Example command for daemonlog is not correct
[ https://issues.apache.org/jira/browse/HDFS-16236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16236 started by Renukaprasad C. - > Example command for daemonlog is not correct > > > Key: HDFS-16236 > URL: https://issues.apache.org/jira/browse/HDFS-16236 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1 >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > getlevel command included the log level, which lead to command failure. > Loglevel required only for setlevel API. > bin/hadoop daemonlog -getlevel 127.0.0.1:9871 > org.apache.hadoop.hdfs.server.namenode.NameNode DEBUG -protocol https -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16236) Example command for daemonlog is not correct
Renukaprasad C created HDFS-16236: - Summary: Example command for daemonlog is not correct Key: HDFS-16236 URL: https://issues.apache.org/jira/browse/HDFS-16236 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 3.3.1 Reporter: Renukaprasad C Assignee: Renukaprasad C getlevel command included the log level, which lead to command failure. Loglevel required only for setlevel API. bin/hadoop daemonlog -getlevel 127.0.0.1:9871 org.apache.hadoop.hdfs.server.namenode.NameNode DEBUG -protocol https -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16220) [FGL]Configurable INodeMap#NAMESPACE_KEY_DEPTH_RANGES_STATIC
[ https://issues.apache.org/jira/browse/HDFS-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413768#comment-17413768 ] Renukaprasad C commented on HDFS-16220: --- Thanks [~jianghuazhu] for reporting the issue and the patch. Configuration file has issue, which results many test failures. You may correct it, should be able to get rid of these unwanted results. Also, there are some static issues reported please take a look. [~shv] [~xinglin] when you feel free, can you please take a look at the PR? > [FGL]Configurable INodeMap#NAMESPACE_KEY_DEPTH_RANGES_STATIC > > > Key: HDFS-16220 > URL: https://issues.apache.org/jira/browse/HDFS-16220 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In INodeMap, NAMESPACE_KEY_DEPTH and NUM_RANGES_STATIC are a fixed value, we > should make it configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413087#comment-17413087 ] Renukaprasad C commented on HDFS-14703: --- Thanks [~jianghuazhu] for your interest & attention on this task. Yes, we need to make it configurable. Didnt pay much attention to it in the POC. It will be great if you can trace this issue. Also, i suggest to make partition count - INodeMap#NUM_RANGES_STATIC configurable along with DEPTH. "By the way, in our cluster, there are more than 100 million INodes." – We have tried upto 10M files/Dirs. Larger the data set, could see the better results. You can share us reports in case you have done benchmarking with FGL branch. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412801#comment-17412801 ] Renukaprasad C commented on HDFS-14703: --- Thanks [~jianghuazhu] for sharing your thoughts. Hope this will clarify your doubts. INodeMap#NAMESPACE_KEY_DEPTH is desighed with flexibility. Yes, by default it is 2 which is cobmination of (ParentINodeId, INodeId). When you set it to 3, then GrandParentId as well. We have tried upto level 3 with basic functionality. But performance not measured. We continued to use with the default value - 2. I am not sure of any use case to increase the values to higher number (Atleast i havent done any testing on this part). By default each partition capacity is 117965 (65536 * 1.8), we continue to use the default values in our test. We also checked the scenarios when dynamic partitions were added. No perf degrade on dynamic partitions, infact this is expected to give higher throuput. We havent noticed very high CPU usage upto 1M file write Ops (Resouce usage statistics we need to capture yet with base & FGL Patch), so this shouldnt have any impact of the other operations (RPC or any other server side processing tasks). In case if you have missed the design please go through the latest desigh doc - NameNode Fine-Grained Locking.pdf [~shv] [~xinglin] Would you like to share your inputs? > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16191) [FGL] Fix FSImage loading issues on dynamic partitions
[ https://issues.apache.org/jira/browse/HDFS-16191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412170#comment-17412170 ] Renukaprasad C commented on HDFS-16191: --- Thanks [~xinglin] for review & feedback. org.apache.hadoop.util.PartitionedGSet#addNewPartitionIfNeeded – Here are check the SIZE of the partition and create/return new partition if the size exceeds otherwise the same partition. private PartitionEntry addNewPartitionIfNeeded( PartitionEntry curPart, K key) { if(curPart.size() < DEFAULT_PARTITION_CAPACITY * DEFAULT_PARTITION_OVERFLOW || curPart.contains(key)) { return curPart; } return addNewPartition(key); } Here we add new partition whenever the size exceeds the threshold configured. Once new partition is added and some inodes added into it, which fails while iterating (As we iterated only static partitions). With the above patch, i had verified the functionality & related UTs, which are working fine. One issue i found here is, Static partitions were added as => range key[0, 16385],range key[1, 16385],range key[25, 16385], where as dynamic partitions were added like inodefile[0, ], inodefile[0, Y InodeId] When these nodes are compared to get the partition, we get the newly added partition iNodeFile[0, X inodeId] after range key[0, 16385] is full. Let me check this scenario once again, any other issue will discuss. Meanwhile you can also check the scenario when one partition gets full. > [FGL] Fix FSImage loading issues on dynamic partitions > -- > > Key: HDFS-16191 > URL: https://issues.apache.org/jira/browse/HDFS-16191 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > When new partitions gets added into PartitionGSet, iterator do not consider > the new partitions. Which always iterate on Static Partition count. This lead > to full of warn messages as below. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139780 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139781 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139784 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139785 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139786 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139788 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139789 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139790 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139791 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139793 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139795 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139796 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139797 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139800 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139801 when saving the leases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410702#comment-17410702 ] Renukaprasad C commented on HDFS-14703: --- [~jianghuazhu] Initially there are 2 commits done as part of POC in the beginning. INodeMap with PartitionedGSet and per-partition locking (This will map to Jira - HDFS-14734 & HDFS-14732). [FGL] Introduce INode key. (This will map to Jira - HDFS-14733) > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16208) [FGL] Implement Delete API with FGL
[ https://issues.apache.org/jira/browse/HDFS-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410693#comment-17410693 ] Renukaprasad C commented on HDFS-16208: --- Sure [~jianghuazhu]. I missed to attach the report for DELETE operation. And the query used - ./hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs file:/// -op delete -threads 200 -files 100 -filesPerDir 100 ||Itr||Base||Patch|| |1|36886|55126| |2|40783|52029| |3|39698|40950| |4|42247|55157| |5|38197|49285| |Avg|39562|50509| |Imp %| |27%| > [FGL] Implement Delete API with FGL > --- > > Key: HDFS-16208 > URL: https://issues.apache.org/jira/browse/HDFS-16208 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Replace all global locks for file / directory deletion with FGL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16128) [FGL] Add support for saving/loading an FS Image for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410229#comment-17410229 ] Renukaprasad C commented on HDFS-16128: --- Understood [~xinglin], thanks for detailed clarification. Addressed couple of related issues - https://issues.apache.org/jira/browse/HDFS-16191 Please take a look whenever you get time. > [FGL] Add support for saving/loading an FS Image for PartitionedGSet > > > Key: HDFS-16128 > URL: https://issues.apache.org/jira/browse/HDFS-16128 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: Fine-Grained Locking > > Time Spent: 50m > Remaining Estimate: 0h > > Add support to save Inodes stored in PartitionedGSet when saving an FS image > and load Inodes into PartitionedGSet from a saved FS image. > h1. Saving FSImage > *Original HDFS design*: iterate every inode in inodeMap and save them into > the FSImage file. > *FGL*: no change is needed here, since PartitionedGSet also provides an > iterator interface, to iterate over inodes stored in partitions. > h1. Loading an HDFS > *Original HDFS design*: it first loads the FSImage files and then loads edit > logs for recent changes. FSImage files contain different sections, including > INodeSections and INodeDirectorySections. An InodeSection contains serialized > Inodes objects and the INodeDirectorySection contains the parent inode for an > Inode. When loading an FSImage, the system first loads INodeSections and then > load the INodeDirectorySections, to set the parent inode for each inode. > After FSImage files are loaded, edit logs are then loaded. Edit log contains > recent changes to the filesystem, including Inodes creation/deletion. For a > newly created INode, the parent inode is set before it is added to the > inodeMap. > *FGL*: when adding an Inode into the partitionedGSet, we need the parent > inode of an inode, in order to determine which partition to store that inode, > when NAMESPACE_KEY_DEPTH = 2. Thus, in FGL, when loading FSImage files, we > used a temporary LightweightGSet (inodeMapTemp), to store inodes. When > LoadFSImage is done, the parent inode for all existing inodes in FSImage > files is set. We can now move the inodes into a partitionedGSet. Load edit > logs can work as usual, as the parent inode for an inode is set before it is > added to the inodeMap. > In theory, PartitionedGSet can support to store inodes without setting its > parent inodes. All these inodes will be stored in the 0th partition. However, > we decide to use a temporary LightweightGSet (inodeMapTemp) to store these > inodes, to make this case more transparent. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16138) BlockReportProcessingThread exit doesn't print the actual stack
[ https://issues.apache.org/jira/browse/HDFS-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410226#comment-17410226 ] Renukaprasad C commented on HDFS-16138: --- Thank you [~hexiaoqiao]and [~hemanthboyina] . > BlockReportProcessingThread exit doesn't print the actual stack > --- > > Key: HDFS-16138 > URL: https://issues.apache.org/jira/browse/HDFS-16138 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > BlockReportProcessingThread thread may gets exited with multiple reasons, but > the current logging prints only the exception message with different stack > which is difficult to debug the issue. > > Existing logging: > 2021-07-20 10:20:23,104 [Block report processor] INFO util.ExitUtil > (ExitUtil.java:terminate(210)) - Exiting with status 1: Block report > processor encountered fatal exception: java.lang.AssertionError > 2021-07-20 10:20:23,104 [Block report processor] ERROR util.ExitUtil > (ExitUtil.java:terminate(213)) - Terminate called > 1: Block report processor encountered fatal exception: > java.lang.AssertionError > at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) > Exception in thread "Block report processor" 1: Block report processor > encountered fatal exception: java.lang.AssertionError > at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) > > Actual issue found at: > 2021-07-20 10:20:23,101 [Block report processor] ERROR > blockmanagement.BlockManager (BlockManager.java:run(5314)) - > java.lang.AssertionError > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3480) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:4280) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4202) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4338) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4305) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:4853) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$2.run(NameNodeRpcServer.java:1657) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5334) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5312) > > This issue found while working on FGL branch. But, same issue can happen in > Trunk also in any error scenario. > > [~hemanthboyina] [~hexiaoqiao] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16195) Fix log message when choosing storage groups for block movement in balancer
[ https://issues.apache.org/jira/browse/HDFS-16195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410106#comment-17410106 ] Renukaprasad C edited comment on HDFS-16195 at 9/5/21, 5:21 AM: Thanks [~vjasani] for the detailed info. If you think this is good idea, I can create a Jira for the same. – yes, we shall go with this. [~hemanthboyina] can you take a look into the latest patch? was (Author: prasad-acit): Thanks [~vjasani] for the detailed info. If you think this is good idea, I can create a Jira for the same. – yes, we shall go with this. [~Hemanth Boyina] can you take a look into the latest patch? > Fix log message when choosing storage groups for block movement in balancer > --- > > Key: HDFS-16195 > URL: https://issues.apache.org/jira/browse/HDFS-16195 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Preeti >Priority: Major > Attachments: HADOOP-16195.001.patch, HADOOP-16195.002.patch, > HADOOP-16195.003.patch, hadoop-format.xml > > > Correct the log message in line with the logic associated with > moving blocks in chooseStorageGroups() in the balancer. All log lines should > indicate from which storage source the blocks are being moved correctly to > avoid ambiguity. Right now one of the log lines is incorrect: > [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java#L555] > which indicates that storage blocks are moved from underUtilized to > aboveAvgUtilized nodes, while it is actually the other way around in the code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16195) Fix log message when choosing storage groups for block movement in balancer
[ https://issues.apache.org/jira/browse/HDFS-16195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17410106#comment-17410106 ] Renukaprasad C commented on HDFS-16195: --- Thanks [~vjasani] for the detailed info. If you think this is good idea, I can create a Jira for the same. – yes, we shall go with this. [~Hemanth Boyina] can you take a look into the latest patch? > Fix log message when choosing storage groups for block movement in balancer > --- > > Key: HDFS-16195 > URL: https://issues.apache.org/jira/browse/HDFS-16195 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Preeti >Priority: Major > Attachments: HADOOP-16195.001.patch, HADOOP-16195.002.patch, > HADOOP-16195.003.patch, hadoop-format.xml > > > Correct the log message in line with the logic associated with > moving blocks in chooseStorageGroups() in the balancer. All log lines should > indicate from which storage source the blocks are being moved correctly to > avoid ambiguity. Right now one of the log lines is incorrect: > [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java#L555] > which indicates that storage blocks are moved from underUtilized to > aboveAvgUtilized nodes, while it is actually the other way around in the code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16077) OIV parsing tool throws NPE for a FSImage with multiple InodeSections
[ https://issues.apache.org/jira/browse/HDFS-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C resolved HDFS-16077. --- Resolution: Not A Bug Private changes caused the bug, not applicable to OS. > OIV parsing tool throws NPE for a FSImage with multiple InodeSections > - > > Key: HDFS-16077 > URL: https://issues.apache.org/jira/browse/HDFS-16077 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Priority: Major > > An FSImage with Multiple InodeSections is resulting in NPE when accessed > through OIV Tool with default Parser (WEB) > This issue is reproducible only with multiple InodeSections (Writing more > than 1 Million Files) > On analyzing the code further we found that NPE is caused in > org.apache.hadoop.hdfs.tools.offlineImageViewer.FSImageLoader.fromINodeId(long). > fromINodeId(long) is searching for Inode in an Inodesection which doesn't > have the Inode(but exists in another InodeSection) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16195) Fix log message when choosing storage groups for block movement in balancer
[ https://issues.apache.org/jira/browse/HDFS-16195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16195: -- Attachment: hadoop-format.xml > Fix log message when choosing storage groups for block movement in balancer > --- > > Key: HDFS-16195 > URL: https://issues.apache.org/jira/browse/HDFS-16195 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Preeti >Priority: Major > Attachments: HADOOP-16195.001.patch, HADOOP-16195.002.patch, > HADOOP-16195.003.patch, hadoop-format.xml > > > Correct the log message in line with the logic associated with > moving blocks in chooseStorageGroups() in the balancer. All log lines should > indicate from which storage source the blocks are being moved correctly to > avoid ambiguity. Right now one of the log lines is incorrect: > [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java#L555] > which indicates that storage blocks are moved from underUtilized to > aboveAvgUtilized nodes, while it is actually the other way around in the code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16195) Fix log message when choosing storage groups for block movement in balancer
[ https://issues.apache.org/jira/browse/HDFS-16195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409918#comment-17409918 ] Renukaprasad C commented on HDFS-16195: --- Thanks for the patch, changes are fine. LGTM for HADOOP-16195.003.patch. Formatter you can refer to - hadoop-format.xml attached above. [~vjasani] could you share the link if the common formatter is available globally? Thank you. > Fix log message when choosing storage groups for block movement in balancer > --- > > Key: HDFS-16195 > URL: https://issues.apache.org/jira/browse/HDFS-16195 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Preeti >Priority: Major > Attachments: HADOOP-16195.001.patch, HADOOP-16195.002.patch, > HADOOP-16195.003.patch, hadoop-format.xml > > > Correct the log message in line with the logic associated with > moving blocks in chooseStorageGroups() in the balancer. All log lines should > indicate from which storage source the blocks are being moved correctly to > avoid ambiguity. Right now one of the log lines is incorrect: > [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java#L555] > which indicates that storage blocks are moved from underUtilized to > aboveAvgUtilized nodes, while it is actually the other way around in the code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16077) OIV parsing tool throws NPE for a FSImage with multiple InodeSections
[ https://issues.apache.org/jira/browse/HDFS-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409916#comment-17409916 ] Renukaprasad C commented on HDFS-16077: --- Verified the scenario with trunk version, issue doesnt exists. Scneario is specific to some private changes, same has been confirmed. Thanks [~sodonnell] for the clarification, Thanks [~Sushma_28] for reporting the issue. > OIV parsing tool throws NPE for a FSImage with multiple InodeSections > - > > Key: HDFS-16077 > URL: https://issues.apache.org/jira/browse/HDFS-16077 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ravuri Sushma sree >Priority: Major > > An FSImage with Multiple InodeSections is resulting in NPE when accessed > through OIV Tool with default Parser (WEB) > This issue is reproducible only with multiple InodeSections (Writing more > than 1 Million Files) > On analyzing the code further we found that NPE is caused in > org.apache.hadoop.hdfs.tools.offlineImageViewer.FSImageLoader.fromINodeId(long). > fromINodeId(long) is searching for Inode in an Inodesection which doesn't > have the Inode(but exists in another InodeSection) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16208) [FGL] Implement Delete API with FGL
[ https://issues.apache.org/jira/browse/HDFS-16208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16208 started by Renukaprasad C. - > [FGL] Implement Delete API with FGL > --- > > Key: HDFS-16208 > URL: https://issues.apache.org/jira/browse/HDFS-16208 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Replace all global locks for file / directory deletion with FGL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16195) Fix log message when choosing storage groups for block movement in balancer
[ https://issues.apache.org/jira/browse/HDFS-16195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409310#comment-17409310 ] Renukaprasad C commented on HDFS-16195: --- Thanks [~preetium] for the patch, still line length is exceeding the threshold. You can correct. Also, formatter is different, you can follow the hadoop formatting. > Fix log message when choosing storage groups for block movement in balancer > --- > > Key: HDFS-16195 > URL: https://issues.apache.org/jira/browse/HDFS-16195 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Preeti >Priority: Major > Attachments: HADOOP-16195.001.patch, HADOOP-16195.002.patch > > > Correct the log message in line with the logic associated with > moving blocks in chooseStorageGroups() in the balancer. All log lines should > indicate from which storage source the blocks are being moved correctly to > avoid ambiguity. Right now one of the log lines is incorrect: > [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java#L555] > which indicates that storage blocks are moved from underUtilized to > aboveAvgUtilized nodes, while it is actually the other way around in the code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16138) BlockReportProcessingThread exit doesnt print the acutal stack
[ https://issues.apache.org/jira/browse/HDFS-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17409130#comment-17409130 ] Renukaprasad C commented on HDFS-16138: --- Thanks [~hemanthboyina], Exception being thrown in org.apache.hadoop.util.ExitUtil#terminate(int, java.lang.String) via BP Processing thread - org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.BlockReportProcessingThread#run Below code, which create new exception bu consuming the actual exception. public static void terminate(int status, String msg) throws ExitException { terminate(new ExitException(status, msg)); } I couldnt extend UT as error is from private thread. Simulation would required lot of mocking. If you still insist, we shall look into it further. Also, other comments addressed and pushed the changes. Please review the changes. Thank you. > BlockReportProcessingThread exit doesnt print the acutal stack > -- > > Key: HDFS-16138 > URL: https://issues.apache.org/jira/browse/HDFS-16138 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > BlockReportProcessingThread thread may gets exited with multiple reasons, but > the current logging prints only the exception message with different stack > which is difficult to debug the issue. > > Existing logging: > 2021-07-20 10:20:23,104 [Block report processor] INFO util.ExitUtil > (ExitUtil.java:terminate(210)) - Exiting with status 1: Block report > processor encountered fatal exception: java.lang.AssertionError > 2021-07-20 10:20:23,104 [Block report processor] ERROR util.ExitUtil > (ExitUtil.java:terminate(213)) - Terminate called > 1: Block report processor encountered fatal exception: > java.lang.AssertionError > at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) > Exception in thread "Block report processor" 1: Block report processor > encountered fatal exception: java.lang.AssertionError > at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) > > Actual issue found at: > 2021-07-20 10:20:23,101 [Block report processor] ERROR > blockmanagement.BlockManager (BlockManager.java:run(5314)) - > java.lang.AssertionError > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3480) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:4280) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4202) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4338) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4305) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:4853) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$2.run(NameNodeRpcServer.java:1657) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5334) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5312) > > This issue found while working on FGL branch. But, same issue can happen in > Trunk also in any error scenario. > > [~hemanthboyina] [~hexiaoqiao] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16208) [FGL] Implement Delete API with FGL
Renukaprasad C created HDFS-16208: - Summary: [FGL] Implement Delete API with FGL Key: HDFS-16208 URL: https://issues.apache.org/jira/browse/HDFS-16208 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Renukaprasad C Assignee: Renukaprasad C Replace all global locks for file / directory deletion with FGL. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16193) [FGL] Implement Append & Rename APIs with FGL
[ https://issues.apache.org/jira/browse/HDFS-16193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16193 started by Renukaprasad C. - > [FGL] Implement Append & Rename APIs with FGL > - > > Key: HDFS-16193 > URL: https://issues.apache.org/jira/browse/HDFS-16193 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Replace globla lock with FGL in Append & Rename APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16195) Fix log message when choosing storage groups for block movement in balancer
[ https://issues.apache.org/jira/browse/HDFS-16195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408980#comment-17408980 ] Renukaprasad C commented on HDFS-16195: --- Thanks [~preetium] for the patch. Messages are more more meaninful than before. Would you fix the checkstyle issues & update the patch? > Fix log message when choosing storage groups for block movement in balancer > --- > > Key: HDFS-16195 > URL: https://issues.apache.org/jira/browse/HDFS-16195 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Preeti >Priority: Major > Attachments: HADOOP-16195.001.patch > > > Correct the log message in line with the logic associated with > moving blocks in chooseStorageGroups() in the balancer. All log lines should > indicate from which storage source the blocks are being moved correctly to > avoid ambiguity. Right now one of the log lines is incorrect: > [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java#L555] > which indicates that storage blocks are moved from underUtilized to > aboveAvgUtilized nodes, while it is actually the other way around in the code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16193) [FGL] Implement Append & Rename APIs with FGL
[ https://issues.apache.org/jira/browse/HDFS-16193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408811#comment-17408811 ] Renukaprasad C commented on HDFS-16193: --- [~shv] [~xinglin] Modified 2 APIs to support FGL. Can you please review the changes? Thanks. Attached the performance report for both the APIs for reference. ./hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs file:/// -op rename -threads 200 -files 100 -filesPerDir 100 -keepResults Performance report for Rename API: ||Itr||Base||Patch|| |1|41001|51519| |2|41310|49431| |3|39062|49652| |Avg|40457|50200| |Impr| |24%| ./hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs file:/// -op append -threads 100 -files 10 -filesPerDir 100 Performance report for Append API: ||Itr||Base||Patch|| |1|35523|39478| |2|41390|55096| |3|41425|47014| |4|32829|43649| |5| 36443|55157| |Avg|37522|48078| |Impr| |28%| > [FGL] Implement Append & Rename APIs with FGL > - > > Key: HDFS-16193 > URL: https://issues.apache.org/jira/browse/HDFS-16193 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Replace globla lock with FGL in Append & Rename APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16141) [FGL] Address permission related issues with File / Directory
[ https://issues.apache.org/jira/browse/HDFS-16141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408729#comment-17408729 ] Renukaprasad C commented on HDFS-16141: --- Thank you [~shv] for the review of patch and commit. > [FGL] Address permission related issues with File / Directory > - > > Key: HDFS-16141 > URL: https://issues.apache.org/jira/browse/HDFS-16141 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Fix For: Fine-Grained Locking > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Post FGL implementation (MKDIR & Create File), there are existing UTs got > impacted which needs to be addressed. > Failed Tests: > TestDFSPermission > TestPermission > TestFileCreation > TestDFSMkdirs (Added tests) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16195) Fix log message when choosing storage groups for block movement in balancer
[ https://issues.apache.org/jira/browse/HDFS-16195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406531#comment-17406531 ] Renukaprasad C commented on HDFS-16195: --- Thanks [~preetium] for reporting the issue. True. Though code is written accordingly, log message is little confusing. We can correct the message here. Are you working on the patch? > Fix log message when choosing storage groups for block movement in balancer > --- > > Key: HDFS-16195 > URL: https://issues.apache.org/jira/browse/HDFS-16195 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Reporter: Preeti >Priority: Major > > Correct the log message in line with the logic associated with > moving blocks in chooseStorageGroups() in the balancer. All log lines should > indicate from which storage source the blocks are being moved correctly to > avoid ambiguity. Right now one of the log lines is incorrect: > [https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java#L555] > which indicates that storage blocks are moved from underUtilized to > aboveAvgUtilized nodes, while it is actually the other way around in the code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16191) [FGL] Fix FSImage loading issues on dynamic partitions
[ https://issues.apache.org/jira/browse/HDFS-16191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16191 started by Renukaprasad C. - > [FGL] Fix FSImage loading issues on dynamic partitions > -- > > Key: HDFS-16191 > URL: https://issues.apache.org/jira/browse/HDFS-16191 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When new partitions gets added into PartitionGSet, iterator do not consider > the new partitions. Which always iterate on Static Partition count. This lead > to full of warn messages as below. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139780 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139781 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139784 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139785 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139786 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139788 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139789 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139790 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139791 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139793 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139795 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139796 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139797 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139800 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139801 when saving the leases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16193) [FGL] Implement Append & Rename APIs with FGL
Renukaprasad C created HDFS-16193: - Summary: [FGL] Implement Append & Rename APIs with FGL Key: HDFS-16193 URL: https://issues.apache.org/jira/browse/HDFS-16193 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Renukaprasad C Assignee: Renukaprasad C Replace globla lock with FGL in Append & Rename APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16191) [FGL] Fix FSImage loading issues on dynamic partitions
[ https://issues.apache.org/jira/browse/HDFS-16191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406160#comment-17406160 ] Renukaprasad C commented on HDFS-16191: --- [~shv] [~xinglin] Some failed scenarios were handled in FSImage loading, can you please help to review the changes? > [FGL] Fix FSImage loading issues on dynamic partitions > -- > > Key: HDFS-16191 > URL: https://issues.apache.org/jira/browse/HDFS-16191 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When new partitions gets added into PartitionGSet, iterator do not consider > the new partitions. Which always iterate on Static Partition count. This lead > to full of warn messages as below. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139780 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139781 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139784 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139785 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139786 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139788 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139789 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139790 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139791 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139793 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139795 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139796 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139797 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139800 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139801 when saving the leases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16191) [FGL] Fix FSImage loading issues on dynamic partitions
[ https://issues.apache.org/jira/browse/HDFS-16191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16191: -- Summary: [FGL] Fix FSImage loading issues on dynamic partitions (was: [FGL] Loading FSImage loading with errors) > [FGL] Fix FSImage loading issues on dynamic partitions > -- > > Key: HDFS-16191 > URL: https://issues.apache.org/jira/browse/HDFS-16191 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > > When new partitions gets added into PartitionGSet, iterator do not consider > the new partitions. Which always iterate on Static Partition count. This lead > to full of warn messages as below. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139780 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139781 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139784 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139785 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139786 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139788 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139789 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139790 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139791 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139793 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139795 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139796 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139797 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139800 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139801 when saving the leases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16191) [FGL] Loading FSImage loading with errors
[ https://issues.apache.org/jira/browse/HDFS-16191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16191: -- Summary: [FGL] Loading FSImage loading with errors (was: Loading FSImage loading with errors) > [FGL] Loading FSImage loading with errors > - > > Key: HDFS-16191 > URL: https://issues.apache.org/jira/browse/HDFS-16191 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > > When new partitions gets added into PartitionGSet, iterator do not consider > the new partitions. Which always iterate on Static Partition count. This lead > to full of warn messages as below. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139780 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139781 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139784 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139785 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139786 when saving the leases. > 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139788 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139789 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139790 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139791 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139793 when saving the leases. > 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139795 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139796 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139797 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139800 when saving the leases. > 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find > inode 139801 when saving the leases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16191) Loading FSImage loading with errors
Renukaprasad C created HDFS-16191: - Summary: Loading FSImage loading with errors Key: HDFS-16191 URL: https://issues.apache.org/jira/browse/HDFS-16191 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Renukaprasad C Assignee: Renukaprasad C When new partitions gets added into PartitionGSet, iterator do not consider the new partitions. Which always iterate on Static Partition count. This lead to full of warn messages as below. 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find inode 139780 when saving the leases. 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find inode 139781 when saving the leases. 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find inode 139784 when saving the leases. 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find inode 139785 when saving the leases. 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find inode 139786 when saving the leases. 2021-08-28 03:23:19,420 WARN namenode.FSImageFormatPBINode: Fail to find inode 139788 when saving the leases. 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find inode 139789 when saving the leases. 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find inode 139790 when saving the leases. 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find inode 139791 when saving the leases. 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find inode 139793 when saving the leases. 2021-08-28 03:23:19,421 WARN namenode.FSImageFormatPBINode: Fail to find inode 139795 when saving the leases. 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find inode 139796 when saving the leases. 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find inode 139797 when saving the leases. 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find inode 139800 when saving the leases. 2021-08-28 03:23:19,422 WARN namenode.FSImageFormatPBINode: Fail to find inode 139801 when saving the leases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16128) [FGL] Add support for saving/loading an FS Image for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17404107#comment-17404107 ] Renukaprasad C commented on HDFS-16128: --- org.apache.hadoop.hdfs.server.namenode.INodeMap#get(long) here {code:java} pgs.get(inode); should be able to get the inode from the partitions. But we changed this code with for (int p = 0; p < NUM_RANGES_STATIC; p++) { INodeDirectory key = new INodeDirectory(INodeId.ROOT_INODE_ID, "range key".getBytes(StandardCharsets.UTF_8), perm, 0); key.setParent(new INodeDirectory((long)p, null, perm, 0)); PartitionedGSet.PartitionEntry e = pgs.getPartition(key); if (e.contains(inode)) { return (INode) e.get(inode); } } {code} But the new code fails to get the INode when new partitions were added dynamically. This part of code can be changed back to "pgs.get(inode);" ? Any issue found with this code? > [FGL] Add support for saving/loading an FS Image for PartitionedGSet > > > Key: HDFS-16128 > URL: https://issues.apache.org/jira/browse/HDFS-16128 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: Fine-Grained Locking > > Time Spent: 50m > Remaining Estimate: 0h > > Add support to save Inodes stored in PartitionedGSet when saving an FS image > and load Inodes into PartitionedGSet from a saved FS image. > h1. Saving FSImage > *Original HDFS design*: iterate every inode in inodeMap and save them into > the FSImage file. > *FGL*: no change is needed here, since PartitionedGSet also provides an > iterator interface, to iterate over inodes stored in partitions. > h1. Loading an HDFS > *Original HDFS design*: it first loads the FSImage files and then loads edit > logs for recent changes. FSImage files contain different sections, including > INodeSections and INodeDirectorySections. An InodeSection contains serialized > Inodes objects and the INodeDirectorySection contains the parent inode for an > Inode. When loading an FSImage, the system first loads INodeSections and then > load the INodeDirectorySections, to set the parent inode for each inode. > After FSImage files are loaded, edit logs are then loaded. Edit log contains > recent changes to the filesystem, including Inodes creation/deletion. For a > newly created INode, the parent inode is set before it is added to the > inodeMap. > *FGL*: when adding an Inode into the partitionedGSet, we need the parent > inode of an inode, in order to determine which partition to store that inode, > when NAMESPACE_KEY_DEPTH = 2. Thus, in FGL, when loading FSImage files, we > used a temporary LightweightGSet (inodeMapTemp), to store inodes. When > LoadFSImage is done, the parent inode for all existing inodes in FSImage > files is set. We can now move the inodes into a partitionedGSet. Load edit > logs can work as usual, as the parent inode for an inode is set before it is > added to the inodeMap. > In theory, PartitionedGSet can support to store inodes without setting its > parent inodes. All these inodes will be stored in the 0th partition. However, > we decide to use a temporary LightweightGSet (inodeMapTemp) to store these > inodes, to make this case more transparent. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16138) BlockReportProcessingThread exit doesnt print the acutal stack
[ https://issues.apache.org/jira/browse/HDFS-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16138 started by Renukaprasad C. - > BlockReportProcessingThread exit doesnt print the acutal stack > -- > > Key: HDFS-16138 > URL: https://issues.apache.org/jira/browse/HDFS-16138 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > BlockReportProcessingThread thread may gets exited with multiple reasons, but > the current logging prints only the exception message with different stack > which is difficult to debug the issue. > > Existing logging: > 2021-07-20 10:20:23,104 [Block report processor] INFO util.ExitUtil > (ExitUtil.java:terminate(210)) - Exiting with status 1: Block report > processor encountered fatal exception: java.lang.AssertionError > 2021-07-20 10:20:23,104 [Block report processor] ERROR util.ExitUtil > (ExitUtil.java:terminate(213)) - Terminate called > 1: Block report processor encountered fatal exception: > java.lang.AssertionError > at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) > Exception in thread "Block report processor" 1: Block report processor > encountered fatal exception: java.lang.AssertionError > at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) > > Actual issue found at: > 2021-07-20 10:20:23,101 [Block report processor] ERROR > blockmanagement.BlockManager (BlockManager.java:run(5314)) - > java.lang.AssertionError > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3480) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:4280) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4202) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4338) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4305) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:4853) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$2.run(NameNodeRpcServer.java:1657) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5334) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5312) > > This issue found while working on FGL branch. But, same issue can happen in > Trunk also in any error scenario. > > [~hemanthboyina] [~hexiaoqiao] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16138) BlockReportProcessingThread exit doesnt print the acutal stack
[ https://issues.apache.org/jira/browse/HDFS-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388439#comment-17388439 ] Renukaprasad C commented on HDFS-16138: --- org.apache.hadoop.util.ExitUtil#terminate(int, java.lang.String) This create new exception, which include the Exception message but miss the actual stack trace. Now, adding full stack. Logging continue as before based on other parameters. [~hexiaoqiao] [~hemanthboyina] can you please take a look whenever you get time? > BlockReportProcessingThread exit doesnt print the acutal stack > -- > > Key: HDFS-16138 > URL: https://issues.apache.org/jira/browse/HDFS-16138 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > BlockReportProcessingThread thread may gets exited with multiple reasons, but > the current logging prints only the exception message with different stack > which is difficult to debug the issue. > > Existing logging: > 2021-07-20 10:20:23,104 [Block report processor] INFO util.ExitUtil > (ExitUtil.java:terminate(210)) - Exiting with status 1: Block report > processor encountered fatal exception: java.lang.AssertionError > 2021-07-20 10:20:23,104 [Block report processor] ERROR util.ExitUtil > (ExitUtil.java:terminate(213)) - Terminate called > 1: Block report processor encountered fatal exception: > java.lang.AssertionError > at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) > Exception in thread "Block report processor" 1: Block report processor > encountered fatal exception: java.lang.AssertionError > at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) > > Actual issue found at: > 2021-07-20 10:20:23,101 [Block report processor] ERROR > blockmanagement.BlockManager (BlockManager.java:run(5314)) - > java.lang.AssertionError > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3480) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:4280) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4202) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4338) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4305) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:4853) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$2.run(NameNodeRpcServer.java:1657) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5334) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5312) > > This issue found while working on FGL branch. But, same issue can happen in > Trunk also in any error scenario. > > [~hemanthboyina] [~hexiaoqiao] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388217#comment-17388217 ] Renukaprasad C commented on HDFS-14703: --- Thanks [Daryn Sharp|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=daryn] for the review & comments. Thanks [Xing Lin|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=xinglin] for quick update. # Was the entry point for the calls via the rpc server, fsn, fsdir, etc? Relevant since end-to-end benchmarking rarely matches microbenchmarks. We have run the benchmarking took in standalone mode with file:// schema. With this we would be able to achieve 50k-60k throughput. # What is “30-40%” improvement? How many ops/sec before and after? When we test in standalone mode, we found an average of 30% improvement with mkdir op. https://issues.apache.org/jira/browse/HDFS-14703?focusedCommentId=17346002=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17346002 # What impact did it have on gc/min and gc time? These are often hidden killers of performance when not taken into consideration. We have noticed that there is no CPU bottleneck with the patch. These metrics we need to capture yet. We shall check further and publish if any impact on GC with the patch. We would like [~shv] to clarify further. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15894) Trace Time-consuming RPC response of certain threshold.
[ https://issues.apache.org/jira/browse/HDFS-15894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-15894: -- Resolution: Duplicate Status: Resolved (was: Patch Available) > Trace Time-consuming RPC response of certain threshold. > --- > > Key: HDFS-15894 > URL: https://issues.apache.org/jira/browse/HDFS-15894 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Attachments: HDFS-15894.001.patch, HDFS-15894.002.patch, > HDFS-15894.003.patch > > > Monitor & Trace Time-consuming RPC requests. > Sometimes RPC Requests gets delayed, which impacts the system performance. > Currently, there is no track for delayed RPC request. > We can log such delayed RPC calls which exceeds certain threshold. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16141) [FGL] Address permission related issues with File / Directory
[ https://issues.apache.org/jira/browse/HDFS-16141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16141 started by Renukaprasad C. - > [FGL] Address permission related issues with File / Directory > - > > Key: HDFS-16141 > URL: https://issues.apache.org/jira/browse/HDFS-16141 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Post FGL implementation (MKDIR & Create File), there are existing UTs got > impacted which needs to be addressed. > Failed Tests: > TestDFSPermission > TestPermission > TestFileCreation > TestDFSMkdirs (Added tests) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16141) [FGL] Address permission related issues with File / Directory
[ https://issues.apache.org/jira/browse/HDFS-16141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386943#comment-17386943 ] Renukaprasad C commented on HDFS-16141: --- [~shv] , [~xinglin] can you please have a look? Thank you. I had run the UT pipeline locally, could see many more tests passed with the patch. > [FGL] Address permission related issues with File / Directory > - > > Key: HDFS-16141 > URL: https://issues.apache.org/jira/browse/HDFS-16141 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Post FGL implementation (MKDIR & Create File), there are existing UTs got > impacted which needs to be addressed. > Failed Tests: > TestDFSPermission > TestPermission > TestFileCreation > TestDFSMkdirs (Added tests) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16138) BlockReportProcessingThread exit doesnt print the acutal stack
[ https://issues.apache.org/jira/browse/HDFS-16138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386932#comment-17386932 ] Renukaprasad C commented on HDFS-16138: --- Thanks [~hexiaoqiao] for the review. This is FGL branch (Trunk & 3.1.1) and issue (AssertionError) is specific to FGL code only. We found the cause and addressed the issue. But, any kind of exceptions in trunk lead to the same stack and ignore the acutal issue. Which is difficult to debug especially in production envs. So, its better to log the complete trace which cause the issue. > BlockReportProcessingThread exit doesnt print the acutal stack > -- > > Key: HDFS-16138 > URL: https://issues.apache.org/jira/browse/HDFS-16138 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > > BlockReportProcessingThread thread may gets exited with multiple reasons, but > the current logging prints only the exception message with different stack > which is difficult to debug the issue. > > Existing logging: > 2021-07-20 10:20:23,104 [Block report processor] INFO util.ExitUtil > (ExitUtil.java:terminate(210)) - Exiting with status 1: Block report > processor encountered fatal exception: java.lang.AssertionError > 2021-07-20 10:20:23,104 [Block report processor] ERROR util.ExitUtil > (ExitUtil.java:terminate(213)) - Terminate called > 1: Block report processor encountered fatal exception: > java.lang.AssertionError > at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) > Exception in thread "Block report processor" 1: Block report processor > encountered fatal exception: java.lang.AssertionError > at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) > > Actual issue found at: > 2021-07-20 10:20:23,101 [Block report processor] ERROR > blockmanagement.BlockManager (BlockManager.java:run(5314)) - > java.lang.AssertionError > java.lang.AssertionError > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3480) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:4280) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4202) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4338) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4305) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:4853) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$2.run(NameNodeRpcServer.java:1657) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5334) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5312) > > This issue found while working on FGL branch. But, same issue can happen in > Trunk also in any error scenario. > > [~hemanthboyina] [~hexiaoqiao] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16141) [FGL] Address permission related issues with File / Directory
Renukaprasad C created HDFS-16141: - Summary: [FGL] Address permission related issues with File / Directory Key: HDFS-16141 URL: https://issues.apache.org/jira/browse/HDFS-16141 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Renukaprasad C Assignee: Renukaprasad C Post FGL implementation (MKDIR & Create File), there are existing UTs got impacted which needs to be addressed. Failed Tests: TestDFSPermission TestPermission TestFileCreation TestDFSMkdirs (Added tests) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16130) [FGL] Implement Create File with FGL
[ https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386603#comment-17386603 ] Renukaprasad C commented on HDFS-16130: --- Thank you [~shv] for review, feedback and corrections. > [FGL] Implement Create File with FGL > > > Key: HDFS-16130 > URL: https://issues.apache.org/jira/browse/HDFS-16130 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Fine-Grained Locking >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Fix For: Fine-Grained Locking > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Implement FGL for Create File. > Create API acquire global lock at mulitiple stages. Acquire the respective > partitioned lock and continue the create operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16138) BlockReportProcessingThread exit doesnt print the acutal stack
Renukaprasad C created HDFS-16138: - Summary: BlockReportProcessingThread exit doesnt print the acutal stack Key: HDFS-16138 URL: https://issues.apache.org/jira/browse/HDFS-16138 Project: Hadoop HDFS Issue Type: Bug Reporter: Renukaprasad C Assignee: Renukaprasad C BlockReportProcessingThread thread may gets exited with multiple reasons, but the current logging prints only the exception message with different stack which is difficult to debug the issue. Existing logging: 2021-07-20 10:20:23,104 [Block report processor] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: Block report processor encountered fatal exception: java.lang.AssertionError 2021-07-20 10:20:23,104 [Block report processor] ERROR util.ExitUtil (ExitUtil.java:terminate(213)) - Terminate called 1: Block report processor encountered fatal exception: java.lang.AssertionError at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) Exception in thread "Block report processor" 1: Block report processor encountered fatal exception: java.lang.AssertionError at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:304) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5315) Actual issue found at: 2021-07-20 10:20:23,101 [Block report processor] ERROR blockmanagement.BlockManager (BlockManager.java:run(5314)) - java.lang.AssertionError java.lang.AssertionError at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3480) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:4280) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:4202) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4338) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:4305) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:4853) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$2.run(NameNodeRpcServer.java:1657) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.processQueue(BlockManager.java:5334) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockReportProcessingThread.run(BlockManager.java:5312) This issue found while working on FGL branch. But, same issue can happen in Trunk also in any error scenario. [~hemanthboyina] [~hexiaoqiao] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16130) [FGL] Implement Create File with FGL
[ https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17385019#comment-17385019 ] Renukaprasad C commented on HDFS-16130: --- Thanks [~xinglin] for quick review & feedback. Corrected the findings, please take a look. Thank you. > [FGL] Implement Create File with FGL > > > Key: HDFS-16130 > URL: https://issues.apache.org/jira/browse/HDFS-16130 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Fine-Grained Locking >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Implement FGL for Create File. > Create API acquire global lock at mulitiple stages. Acquire the respective > partitioned lock and continue the create operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16128) [FGL] Add support for saving/loading an FS Image for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383018#comment-17383018 ] Renukaprasad C commented on HDFS-16128: --- Loading Performance we need to measure, we can deal with it later if any degrade. Functionality is fine. Rest all are ok, +1 from my side. > [FGL] Add support for saving/loading an FS Image for PartitionedGSet > > > Key: HDFS-16128 > URL: https://issues.apache.org/jira/browse/HDFS-16128 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > > Add support to save Inodes stored in PartitionedGSet when saving an FS image > and load Inodes into PartitionedGSet from a saved FS image. > h1. Saving FSImage > *Original HDFS design*: iterate every inode in inodeMap and save them into > the FSImage file. > *FGL*: no change is needed here, since PartitionedGSet also provides an > iterator interface, to iterate over inodes stored in partitions. > h1. Loading an HDFS > *Original HDFS design*: it first loads the FSImage files and then loads edit > logs for recent changes. FSImage files contain different sections, including > INodeSections and INodeDirectorySections. An InodeSection contains serialized > Inodes objects and the INodeDirectorySection contains the parent inode for an > Inode. When loading an FSImage, the system first loads INodeSections and then > load the INodeDirectorySections, to set the parent inode for each inode. > After FSImage files are loaded, edit logs are then loaded. Edit log contains > recent changes to the filesystem, including Inodes creation/deletion. For a > newly created INode, the parent inode is set before it is added to the > inodeMap. > *FGL*: when adding an Inode into the partitionedGSet, we need the parent > inode of an inode, in order to determine which partition to store that inode, > when NAMESPACE_KEY_DEPTH = 2. Thus, in FGL, when loading FSImage files, we > used a temporary LightweightGSet (inodeMapTemp), to store inodes. When > LoadFSImage is done, the parent inode for all existing inodes in FSImage > files is set. We can now move the inodes into a partitionedGSet. Load edit > logs can work as usual, as the parent inode for an inode is set before it is > added to the inodeMap. > In theory, PartitionedGSet can support to store inodes without setting its > parent inodes. All these inodes will be stored in the 0th partition. However, > we decide to use a temporary LightweightGSet (inodeMapTemp) to store these > inodes, to make this case more transparent. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16128) [FGL] Add support for saving/loading an FS Image for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382897#comment-17382897 ] Renukaprasad C commented on HDFS-16128: --- [~xinglin] Thanks for reporting the issue and the patch. Patch holds good, i have tested with MKDIR & Create File with some corrections mentioned in PR. We shall discuss further if any confusion. [~shv] [~hexiaoqiao] Create File along with the MKDIR (POC), would be great combination for testing the framework. If you feel so, can you please take a look at HDFS-16130? > [FGL] Add support for saving/loading an FS Image for PartitionedGSet > > > Key: HDFS-16128 > URL: https://issues.apache.org/jira/browse/HDFS-16128 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > > Add support to save Inodes stored in PartitionedGSet when saving an FS image > and load Inodes into PartitionedGSet from a saved FS image. > h1. Saving FSImage > *Original HDFS design*: iterate every inode in inodeMap and save them into > the FSImage file. > *FGL*: no change is needed here, since PartitionedGSet also provides an > iterator interface, to iterate over inodes stored in partitions. > h1. Loading an HDFS > *Original HDFS design*: it first loads the FSImage files and then loads edit > logs for recent changes. FSImage files contain different sections, including > INodeSections and INodeDirectorySections. An InodeSection contains serialized > Inodes objects and the INodeDirectorySection contains the parent inode for an > Inode. When loading an FSImage, the system first loads INodeSections and then > load the INodeDirectorySections, to set the parent inode for each inode. > After FSImage files are loaded, edit logs are then loaded. Edit log contains > recent changes to the filesystem, including Inodes creation/deletion. For a > newly created INode, the parent inode is set before it is added to the > inodeMap. > *FGL*: when adding an Inode into the partitionedGSet, we need the parent > inode of an inode, in order to determine which partition to store that inode, > when NAMESPACE_KEY_DEPTH = 2. Thus, in FGL, when loading FSImage files, we > used a temporary LightweightGSet (inodeMapTemp), to store inodes. When > LoadFSImage is done, the parent inode for all existing inodes in FSImage > files is set. We can now move the inodes into a partitionedGSet. Load edit > logs can work as usual, as the parent inode for an inode is set before it is > added to the inodeMap. > In theory, PartitionedGSet can support to store inodes without setting its > parent inodes. All these inodes will be stored in the 0th partition. However, > we decide to use a temporary LightweightGSet (inodeMapTemp) to store these > inodes, to make this case more transparent. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17382654#comment-17382654 ] Renukaprasad C commented on HDFS-16067: --- Thanks [~hexiaoqiao] & [~ayushtkn] for review & feedback. > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Fix For: 3.4.0 > > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch, HDFS-16067.004.patch, HDFS-16067.005.patch, > HDFS-16067.006.patch, HDFS-16067.007.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381261#comment-17381261 ] Renukaprasad C commented on HDFS-16067: --- Thanks [~hexiaoqiao] for quick review & feedback. line break - updated the patch, also other checkstyle & compiler warnings were handled in latest patch. Please have a look. Sure, Lets wait for the build results. Thank you. > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch, HDFS-16067.004.patch, HDFS-16067.005.patch, > HDFS-16067.006.patch, HDFS-16067.007.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16067: -- Attachment: HDFS-16067.007.patch > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch, HDFS-16067.004.patch, HDFS-16067.005.patch, > HDFS-16067.006.patch, HDFS-16067.007.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16067: -- Attachment: HDFS-16067.006.patch > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch, HDFS-16067.004.patch, HDFS-16067.005.patch, > HDFS-16067.006.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16125) [FGL] Fix the iterator for PartitionedGSet
[ https://issues.apache.org/jira/browse/HDFS-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380814#comment-17380814 ] Renukaprasad C commented on HDFS-16125: --- [~weichiu] Merge build got failed, could you help us to locate the issue? > [FGL] Fix the iterator for PartitionedGSet > --- > > Key: HDFS-16125 > URL: https://issues.apache.org/jira/browse/HDFS-16125 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Iterator in PartitionedGSet would visit the first partition twice, since we > did not set the keyIterator to move to the first key during initialization. > > This is related to fgl: https://issues.apache.org/jira/browse/HDFS-14703 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16130) [FGL] Implement Create File with FGL
[ https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380811#comment-17380811 ] Renukaprasad C commented on HDFS-16130: --- [~shv] [~xinglin] Can you please take a look into the PR for Create File operation? Thank you. With the above changes i could see around 25% performance improvement. > [FGL] Implement Create File with FGL > > > Key: HDFS-16130 > URL: https://issues.apache.org/jira/browse/HDFS-16130 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Fine-Grained Locking >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Implement FGL for Create File. > Create API acquire global lock at mulitiple stages. Acquire the respective > partitioned lock and continue the create operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-16130) [FGL] Implement Create File with FGL
[ https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-16130 started by Renukaprasad C. - > [FGL] Implement Create File with FGL > > > Key: HDFS-16130 > URL: https://issues.apache.org/jira/browse/HDFS-16130 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Fine-Grained Locking >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Implement FGL for Create File. > Create API acquire global lock at mulitiple stages. Acquire the respective > partitioned lock and continue the create operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16130) [FGL] Implement Create File with FGL
[ https://issues.apache.org/jira/browse/HDFS-16130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16130: -- Summary: [FGL] Implement Create File with FGL (was: FGL for Create File) > [FGL] Implement Create File with FGL > > > Key: HDFS-16130 > URL: https://issues.apache.org/jira/browse/HDFS-16130 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: Fine-Grained Locking >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > > Implement FGL for Create File. > Create API acquire global lock at mulitiple stages. Acquire the respective > partitioned lock and continue the create operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16130) FGL for Create File
Renukaprasad C created HDFS-16130: - Summary: FGL for Create File Key: HDFS-16130 URL: https://issues.apache.org/jira/browse/HDFS-16130 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: Fine-Grained Locking Reporter: Renukaprasad C Assignee: Renukaprasad C Implement FGL for Create File. Create API acquire global lock at mulitiple stages. Acquire the respective partitioned lock and continue the create operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
[ https://issues.apache.org/jira/browse/HDFS-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380424#comment-17380424 ] Renukaprasad C commented on HDFS-14703: --- Thanks [~shv] for review & feedback. Shall i raise separate Jira for Create and trace the PR? Or is ok to go with the current PR? {noformat} Noticed that you implemented getInode(id) by iterating through all inodes. This is probably the key part of this effort. We should eventually replace getInode(id) with getInode(key) to make the inode lookup efficient.{noformat} I totally agree with you, this is an overhead in finding the iNodes on large dataset. Just provided work-around to continue, we shall work on it and eventually optimize it better. > NameNode Fine-Grained Locking via Metadata Partitioning > --- > > Key: HDFS-14703 > URL: https://issues.apache.org/jira/browse/HDFS-14703 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Reporter: Konstantin Shvachko >Priority: Major > Attachments: 001-partitioned-inodeMap-POC.tar.gz, > 002-partitioned-inodeMap-POC.tar.gz, 003-partitioned-inodeMap-POC.tar.gz, > NameNode Fine-Grained Locking.pdf, NameNode Fine-Grained Locking.pdf > > > We target to enable fine-grained locking by splitting the in-memory namespace > into multiple partitions each having a separate lock. Intended to improve > performance of NameNode write operations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16126) VolumePair should override hashcode() method
[ https://issues.apache.org/jira/browse/HDFS-16126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380106#comment-17380106 ] Renukaprasad C commented on HDFS-16126: --- [~lei w] Thanks for reporting the issue. org.apache.hadoop.hdfs.server.datanode.DiskBalancer.VolumePair#hashCode org.apache.hadoop.hdfs.server.datanode.DiskBalancer.VolumePair#equals These 2 methods already implemented in VolumePair. Anything missing out of this? > VolumePair should override hashcode() method > --- > > Key: HDFS-16126 > URL: https://issues.apache.org/jira/browse/HDFS-16126 > Project: Hadoop HDFS > Issue Type: Bug > Components: diskbalancer >Reporter: lei w >Priority: Minor > > Now we use a map to check one plan with more than one line of same > VolumePair in createWorkPlan(final VolumePair volumePair, Step step) , code > is as flow: > {code:java} > private void createWorkPlan(final VolumePair volumePair, Step step) > throws DiskBalancerException { > // ... > // In case we have a plan with more than > // one line of same VolumePair > // we compress that into one work order. > if (workMap.containsKey(volumePair)) {// To check use map > bytesToMove += workMap.get(volumePair).getBytesToCopy(); > } >// ... > } > {code} > I found the object volumePair is always a new object and without hashcode() > method, So use a map to check is invalid. Should we add hashcode() in > VolumePair ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17379187#comment-17379187 ] Renukaprasad C commented on HDFS-16067: --- Thanks [~ayushtkn] for reviewing the patch, Addressed the comments, can you have a look into it when you get time? Thanks . > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch, HDFS-16067.004.patch, HDFS-16067.005.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16067: -- Attachment: HDFS-16067.005.patch > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch, HDFS-16067.004.patch, HDFS-16067.005.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371175#comment-17371175 ] Renukaprasad C commented on HDFS-16067: --- Thanks [~ayushtkn] for review & feedback. {code:java} HdfsFileStatus status = blkWithStatus.getFileStatus(); {code} This i have added as read API after the APPEND operation, aprt from this it is not related to it. Other comments i will address & update the patch soon. Thank you. > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch, HDFS-16067.004.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369056#comment-17369056 ] Renukaprasad C commented on HDFS-16067: --- Thanks [~hexiaoqiao] for the UT clarification & patch review. Yes, printUsage got missed, corrected it in HDFS-16067.004.patch. Please review. > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch, HDFS-16067.004.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16067: -- Attachment: HDFS-16067.004.patch > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch, HDFS-16067.004.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368748#comment-17368748 ] Renukaprasad C commented on HDFS-16067: --- [~hexiaoqiao] There are random UT failures, but the changes are not related to the failed test. Which i verified locally. Can you have a look at the failed tests? Thank you. > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366784#comment-17366784 ] Renukaprasad C commented on HDFS-14575: --- Thank you [~hexiaoqiao] & [~weichiu] for review and feedback. [~Tao Yang] Thanks for the proposal. > LeaseRenewer#daemon threads leak in DFSClient > - > > Key: HDFS-14575 > URL: https://issues.apache.org/jira/browse/HDFS-14575 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tao Yang >Assignee: Renukaprasad C >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch, > HDFS-14575.003.patch, HDFS-14575.004.patch > > > Currently LeaseRenewer (and its daemon thread) without clients should be > terminated after a grace period which defaults to 60 seconds. A race > condition may happen when a new request is coming just after LeaseRenewer > expired. > Reproduce this race condition: > # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 > thread, after a few seconds, File#1 is closed , there is no clients in > LeaseRenewer#1 now. > # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 > thread is still in sleep, Client#1 creates File#2, lead to the creation of > Daemon#2. > # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from > factory. > # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it > can’t get renewer from factory. > Daemon#2 thread leaks from now on, since Client#1 in it can never be removed > and it won't have a chance to stop. > To solve this problem, IIUIC, a simple way I think is to make sure that all > clients are cleared when LeaseRenewer is removed from factory. Please feel > free to give your suggestions. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366780#comment-17366780 ] Renukaprasad C commented on HDFS-16067: --- Thanks [~hexiaoqiao] for quick update. I had update the patch. Failed tests are unrelated, lets wait for the results of build you triggered. > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16067: -- Attachment: HDFS-16067.003.patch > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch, > HDFS-16067.003.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17366236#comment-17366236 ] Renukaprasad C commented on HDFS-16067: --- Thanks [~hexiaoqiao] for quick review & feedback. A. Regarding Prepare Fileset: AppendFileStats extends OpenFileStats, so generateInputs() is being called from the base classOperationStatsBase#benchmark(). Please correct me if i missed something from your point. B. I had updated and uploaded HDFS-16067.002.patch. Please review & feedback if anything else i missed. Thank you. > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-16067: -- Attachment: HDFS-16067.002.patch > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch, HDFS-16067.002.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16067) Support Append API in NNThroughputBenchmark
[ https://issues.apache.org/jira/browse/HDFS-16067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365938#comment-17365938 ] Renukaprasad C commented on HDFS-16067: --- [~hexiaoqiao] [~surendralilhore] can you please have a look into the patch when you find time? Failed test is not related to the code changes done. > Support Append API in NNThroughputBenchmark > --- > > Key: HDFS-16067 > URL: https://issues.apache.org/jira/browse/HDFS-16067 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Minor > Attachments: HDFS-16067.001.patch > > > Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C reassigned HDFS-14575: - Assignee: Renukaprasad C (was: Tao Yang) > LeaseRenewer#daemon threads leak in DFSClient > - > > Key: HDFS-14575 > URL: https://issues.apache.org/jira/browse/HDFS-14575 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tao Yang >Assignee: Renukaprasad C >Priority: Major > Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch, > HDFS-14575.003.patch, HDFS-14575.004.patch > > > Currently LeaseRenewer (and its daemon thread) without clients should be > terminated after a grace period which defaults to 60 seconds. A race > condition may happen when a new request is coming just after LeaseRenewer > expired. > Reproduce this race condition: > # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 > thread, after a few seconds, File#1 is closed , there is no clients in > LeaseRenewer#1 now. > # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 > thread is still in sleep, Client#1 creates File#2, lead to the creation of > Daemon#2. > # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from > factory. > # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it > can’t get renewer from factory. > Daemon#2 thread leaks from now on, since Client#1 in it can never be removed > and it won't have a chance to stop. > To solve this problem, IIUIC, a simple way I think is to make sure that all > clients are cleared when LeaseRenewer is removed from factory. Please feel > free to give your suggestions. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365932#comment-17365932 ] Renukaprasad C commented on HDFS-14575: --- Thank you [~hexiaoqiao] for quick review and feedback. I had incorported the changes and updated the patch - HDFS-14575.004.patch. Please have a look into when you find time. Regarding wild import - Sure, i will consider your suggestion for further patches. Thank you. > LeaseRenewer#daemon threads leak in DFSClient > - > > Key: HDFS-14575 > URL: https://issues.apache.org/jira/browse/HDFS-14575 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch, > HDFS-14575.003.patch, HDFS-14575.004.patch > > > Currently LeaseRenewer (and its daemon thread) without clients should be > terminated after a grace period which defaults to 60 seconds. A race > condition may happen when a new request is coming just after LeaseRenewer > expired. > Reproduce this race condition: > # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 > thread, after a few seconds, File#1 is closed , there is no clients in > LeaseRenewer#1 now. > # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 > thread is still in sleep, Client#1 creates File#2, lead to the creation of > Daemon#2. > # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from > factory. > # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it > can’t get renewer from factory. > Daemon#2 thread leaks from now on, since Client#1 in it can never be removed > and it won't have a chance to stop. > To solve this problem, IIUIC, a simple way I think is to make sure that all > clients are cleared when LeaseRenewer is removed from factory. Please feel > free to give your suggestions. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-14575: -- Attachment: HDFS-14575.004.patch > LeaseRenewer#daemon threads leak in DFSClient > - > > Key: HDFS-14575 > URL: https://issues.apache.org/jira/browse/HDFS-14575 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch, > HDFS-14575.003.patch, HDFS-14575.004.patch > > > Currently LeaseRenewer (and its daemon thread) without clients should be > terminated after a grace period which defaults to 60 seconds. A race > condition may happen when a new request is coming just after LeaseRenewer > expired. > Reproduce this race condition: > # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 > thread, after a few seconds, File#1 is closed , there is no clients in > LeaseRenewer#1 now. > # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 > thread is still in sleep, Client#1 creates File#2, lead to the creation of > Daemon#2. > # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from > factory. > # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it > can’t get renewer from factory. > Daemon#2 thread leaks from now on, since Client#1 in it can never be removed > and it won't have a chance to stop. > To solve this problem, IIUIC, a simple way I think is to make sure that all > clients are cleared when LeaseRenewer is removed from factory. Please feel > free to give your suggestions. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org