[jira] [Created] (HDDS-10909) OM stops

2024-05-23 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HDDS-10909:
---

 Summary: OM stops 
 Key: HDDS-10909
 URL: https://issues.apache.org/jira/browse/HDDS-10909
 Project: Apache Ozone
  Issue Type: Bug
Reporter: Attila Doroszlai


Trying to run Ozone in a single docker container using command from 
[wiki|https://cwiki.apache.org/confluence/display/OZONE/Running+via+DockerHub]:

{code}
docker run -d -p 9878:9878 -p 9876:9876 apache/ozone
{code}

OM stops due to misconfig:

{code}
2024-05-24 05:35:05 WARN  ServerUtils:323 - Storage directory for Ratis is not 
configured. It is a good idea to map this to an SSD disk. Falling back to 
ozone.metadata.dirs
2024-05-24 05:35:05 WARN  OzoneManagerRatisUtils:473 - 
ozone.om.ratis.snapshot.dir is not configured. Falling back to 
ozone.metadata.dirs config
2024-05-24 05:35:05 ERROR OzoneManagerStarter:76 - OM start failed with 
exception
java.io.IOException: Ratis group Dir on disk tmp does not match with 
RaftGroupIDbf265839-605b-3f16-9796-c5ba1605619e generated from service id 
omServiceIdDefault. Looks like there is a change to ozone.om.service
.ids value after the cluster is setup. Currently change to this value is not 
supported.
at 
org.apache.hadoop.ozone.om.OzoneManager.initializeRatisDirs(OzoneManager.java:1476)
at org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:697)
at 
org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:771)
at 
org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
at 
org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
at 
org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
...
at 
org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Updated] (HDDS-10909) OM stops in single-node Docker container with default settings

2024-05-23 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-10909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-10909:

Summary: OM stops in single-node Docker container with default settings  
(was: OM stops )

> OM stops in single-node Docker container with default settings
> --
>
> Key: HDDS-10909
> URL: https://issues.apache.org/jira/browse/HDDS-10909
> Project: Apache Ozone
>  Issue Type: Bug
>Reporter: Attila Doroszlai
>Priority: Major
>
> Trying to run Ozone in a single docker container using command from 
> [wiki|https://cwiki.apache.org/confluence/display/OZONE/Running+via+DockerHub]:
> {code}
> docker run -d -p 9878:9878 -p 9876:9876 apache/ozone
> {code}
> OM stops due to misconfig:
> {code}
> 2024-05-24 05:35:05 WARN  ServerUtils:323 - Storage directory for Ratis is 
> not configured. It is a good idea to map this to an SSD disk. Falling back to 
> ozone.metadata.dirs
> 2024-05-24 05:35:05 WARN  OzoneManagerRatisUtils:473 - 
> ozone.om.ratis.snapshot.dir is not configured. Falling back to 
> ozone.metadata.dirs config
> 2024-05-24 05:35:05 ERROR OzoneManagerStarter:76 - OM start failed with 
> exception
> java.io.IOException: Ratis group Dir on disk tmp does not match with 
> RaftGroupIDbf265839-605b-3f16-9796-c5ba1605619e generated from service id 
> omServiceIdDefault. Looks like there is a change to ozone.om.service
> .ids value after the cluster is setup. Currently change to this value is not 
> supported.
> at 
> org.apache.hadoop.ozone.om.OzoneManager.initializeRatisDirs(OzoneManager.java:1476)
> at 
> org.apache.hadoop.ozone.om.OzoneManager.(OzoneManager.java:697)
> at 
> org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:771)
> at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
> at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
> at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
> at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
> ...
> at 
> org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10488. Datanode OOO due to run out of mmap handler [ozone]

2024-05-23 Thread via GitHub


ChenSammi commented on PR #6690:
URL: https://github.com/apache/ozone/pull/6690#issuecomment-2128427394

   > @ChenSammi if I understand correctly, Datanode does not free up the mmap 
handles as aggressively as we would like. Thus, we need to introduce this 
change. I would recommend the following choices over adding complexity.
   > 
   > 1. Can we bump up the limit in the OS, the Datanode should stable 
state to a certain upper bound based on GC frequency and concurrency setting.
   > 
   > 2. We can revert to non mmap handling if we hit the limit.
   > 
   > 
   > I agree with @duongkame the correctly solution might be to implement a new 
read API using netty and using the better abstractions it offers to manage 
direct buffers and NIO capabilities.
   
   @kerneltime , the proposed solution is exactly putting a upper bound limit 
and fallback to non mmap buffer. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Commented] (HDDS-10750) Intermittent fork timeout while stopping Ratis server

2024-05-23 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-10750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849128#comment-17849128
 ] 

Tsz-wo Sze commented on HDDS-10750:
---

[~wfps1210], assigned it to you.  Thanks!

> Intermittent fork timeout while stopping Ratis server
> -
>
> Key: HDDS-10750
> URL: https://issues.apache.org/jira/browse/HDDS-10750
> Project: Apache Ozone
>  Issue Type: Sub-task
>Reporter: Attila Doroszlai
>Priority: Critical
> Attachments: 2024-04-21T16-53-06_683-jvmRun1.dump, 
> 2024-05-03T11-31-12_561-jvmRun1.dump, 
> org.apache.hadoop.fs.ozone.TestOzoneFileChecksum-output.txt, 
> org.apache.hadoop.hdds.scm.TestSCMInstallSnapshot-output.txt, 
> org.apache.hadoop.ozone.client.rpc.TestECKeyOutputStreamWithZeroCopy-output.txt,
>  org.apache.hadoop.ozone.container.TestECContainerRecovery-output-1.txt, 
> org.apache.hadoop.ozone.container.TestECContainerRecovery-output.txt, 
> org.apache.hadoop.ozone.om.TestOzoneManagerPrepare-output.txt
>
>
> {code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2024/04/21/30803/it-client/output.log}
> [INFO] Running 
> org.apache.hadoop.ozone.client.rpc.TestECKeyOutputStreamWithZeroCopy
> [INFO] 
> [INFO] Results:
> ...
> ... There was a timeout or other error in the fork
> {code}
> {code}
> "main" 
>java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405)
> ...
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopDatanodes(MiniOzoneClusterImpl.java:473)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stop(MiniOzoneClusterImpl.java:414)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:400)
> at 
> org.apache.hadoop.ozone.client.rpc.AbstractTestECKeyOutputStream.shutdown(AbstractTestECKeyOutputStream.java:160)
> "ForkJoinPool.commonPool-worker-7" 
>java.lang.Thread.State: TIMED_WAITING
> ...
> at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
> at 
> org.apache.ratis.util.ConcurrentUtils.shutdownAndWait(ConcurrentUtils.java:144)
> at 
> org.apache.ratis.util.ConcurrentUtils.shutdownAndWait(ConcurrentUtils.java:136)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$close$9(RaftServerProxy.java:438)
> ...
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.close(RaftServerProxy.java:415)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.stop(XceiverServerRatis.java:603)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.stop(OzoneContainer.java:484)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.close(DatanodeStateMachine.java:447)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.stopDaemon(DatanodeStateMachine.java:637)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.stop(HddsDatanodeService.java:550)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopDatanode(MiniOzoneClusterImpl.java:479)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl$$Lambda$2077/645273703.accept(Unknown
>  Source)
> "c7edee5d-bf3c-45a7-a783-e11562f208dc-impl-thread2" 
>java.lang.Thread.State: WAITING
> ...
> at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$close$3(RaftServerImpl.java:543)
> at 
> org.apache.ratis.server.impl.RaftServerImpl$$Lambda$1925/263251010.run(Unknown
>  Source)
> at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$7(LifeCycle.java:306)
> at org.apache.ratis.util.LifeCycle$$Lambda$1204/655954062.get(Unknown 
> Source)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:326)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.close(RaftServerImpl.java:525)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Commented] (HDDS-10750) Intermittent fork timeout while stopping Ratis server

2024-05-23 Thread Chung En Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-10750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849125#comment-17849125
 ] 

Chung En Lee commented on HDDS-10750:
-

[~szetszwo], I'll work on it. Could you assign RATIS-2100 to me, thanks.

> Intermittent fork timeout while stopping Ratis server
> -
>
> Key: HDDS-10750
> URL: https://issues.apache.org/jira/browse/HDDS-10750
> Project: Apache Ozone
>  Issue Type: Sub-task
>Reporter: Attila Doroszlai
>Priority: Critical
> Attachments: 2024-04-21T16-53-06_683-jvmRun1.dump, 
> 2024-05-03T11-31-12_561-jvmRun1.dump, 
> org.apache.hadoop.fs.ozone.TestOzoneFileChecksum-output.txt, 
> org.apache.hadoop.hdds.scm.TestSCMInstallSnapshot-output.txt, 
> org.apache.hadoop.ozone.client.rpc.TestECKeyOutputStreamWithZeroCopy-output.txt,
>  org.apache.hadoop.ozone.container.TestECContainerRecovery-output-1.txt, 
> org.apache.hadoop.ozone.container.TestECContainerRecovery-output.txt, 
> org.apache.hadoop.ozone.om.TestOzoneManagerPrepare-output.txt
>
>
> {code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2024/04/21/30803/it-client/output.log}
> [INFO] Running 
> org.apache.hadoop.ozone.client.rpc.TestECKeyOutputStreamWithZeroCopy
> [INFO] 
> [INFO] Results:
> ...
> ... There was a timeout or other error in the fork
> {code}
> {code}
> "main" 
>java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405)
> ...
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopDatanodes(MiniOzoneClusterImpl.java:473)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stop(MiniOzoneClusterImpl.java:414)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:400)
> at 
> org.apache.hadoop.ozone.client.rpc.AbstractTestECKeyOutputStream.shutdown(AbstractTestECKeyOutputStream.java:160)
> "ForkJoinPool.commonPool-worker-7" 
>java.lang.Thread.State: TIMED_WAITING
> ...
> at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
> at 
> org.apache.ratis.util.ConcurrentUtils.shutdownAndWait(ConcurrentUtils.java:144)
> at 
> org.apache.ratis.util.ConcurrentUtils.shutdownAndWait(ConcurrentUtils.java:136)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$close$9(RaftServerProxy.java:438)
> ...
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.close(RaftServerProxy.java:415)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.stop(XceiverServerRatis.java:603)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.stop(OzoneContainer.java:484)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.close(DatanodeStateMachine.java:447)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.stopDaemon(DatanodeStateMachine.java:637)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.stop(HddsDatanodeService.java:550)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopDatanode(MiniOzoneClusterImpl.java:479)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl$$Lambda$2077/645273703.accept(Unknown
>  Source)
> "c7edee5d-bf3c-45a7-a783-e11562f208dc-impl-thread2" 
>java.lang.Thread.State: WAITING
> ...
> at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$close$3(RaftServerImpl.java:543)
> at 
> org.apache.ratis.server.impl.RaftServerImpl$$Lambda$1925/263251010.run(Unknown
>  Source)
> at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$7(LifeCycle.java:306)
> at org.apache.ratis.util.LifeCycle$$Lambda$1204/655954062.get(Unknown 
> Source)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:326)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.close(RaftServerImpl.java:525)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Commented] (HDDS-10750) Intermittent fork timeout while stopping Ratis server

2024-05-23 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-10750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849123#comment-17849123
 ] 

Tsz-wo Sze commented on HDDS-10750:
---

[~wfps1210], thanks a lot for digging out the problem!  Are you going to submit 
a pull request?  If not, I can work on it.

> Intermittent fork timeout while stopping Ratis server
> -
>
> Key: HDDS-10750
> URL: https://issues.apache.org/jira/browse/HDDS-10750
> Project: Apache Ozone
>  Issue Type: Sub-task
>Reporter: Attila Doroszlai
>Priority: Critical
> Attachments: 2024-04-21T16-53-06_683-jvmRun1.dump, 
> 2024-05-03T11-31-12_561-jvmRun1.dump, 
> org.apache.hadoop.fs.ozone.TestOzoneFileChecksum-output.txt, 
> org.apache.hadoop.hdds.scm.TestSCMInstallSnapshot-output.txt, 
> org.apache.hadoop.ozone.client.rpc.TestECKeyOutputStreamWithZeroCopy-output.txt,
>  org.apache.hadoop.ozone.container.TestECContainerRecovery-output-1.txt, 
> org.apache.hadoop.ozone.container.TestECContainerRecovery-output.txt, 
> org.apache.hadoop.ozone.om.TestOzoneManagerPrepare-output.txt
>
>
> {code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2024/04/21/30803/it-client/output.log}
> [INFO] Running 
> org.apache.hadoop.ozone.client.rpc.TestECKeyOutputStreamWithZeroCopy
> [INFO] 
> [INFO] Results:
> ...
> ... There was a timeout or other error in the fork
> {code}
> {code}
> "main" 
>java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405)
> ...
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopDatanodes(MiniOzoneClusterImpl.java:473)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stop(MiniOzoneClusterImpl.java:414)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:400)
> at 
> org.apache.hadoop.ozone.client.rpc.AbstractTestECKeyOutputStream.shutdown(AbstractTestECKeyOutputStream.java:160)
> "ForkJoinPool.commonPool-worker-7" 
>java.lang.Thread.State: TIMED_WAITING
> ...
> at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
> at 
> org.apache.ratis.util.ConcurrentUtils.shutdownAndWait(ConcurrentUtils.java:144)
> at 
> org.apache.ratis.util.ConcurrentUtils.shutdownAndWait(ConcurrentUtils.java:136)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$close$9(RaftServerProxy.java:438)
> ...
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.close(RaftServerProxy.java:415)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.stop(XceiverServerRatis.java:603)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.stop(OzoneContainer.java:484)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.close(DatanodeStateMachine.java:447)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.stopDaemon(DatanodeStateMachine.java:637)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.stop(HddsDatanodeService.java:550)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopDatanode(MiniOzoneClusterImpl.java:479)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl$$Lambda$2077/645273703.accept(Unknown
>  Source)
> "c7edee5d-bf3c-45a7-a783-e11562f208dc-impl-thread2" 
>java.lang.Thread.State: WAITING
> ...
> at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$close$3(RaftServerImpl.java:543)
> at 
> org.apache.ratis.server.impl.RaftServerImpl$$Lambda$1925/263251010.run(Unknown
>  Source)
> at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$7(LifeCycle.java:306)
> at org.apache.ratis.util.LifeCycle$$Lambda$1204/655954062.get(Unknown 
> Source)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:326)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.close(RaftServerImpl.java:525)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10488. Datanode OOO due to run out of mmap handler [ozone]

2024-05-23 Thread via GitHub


kerneltime commented on PR #6690:
URL: https://github.com/apache/ozone/pull/6690#issuecomment-2128042743

   @ChenSammi if I understand correctly, Datanode does not free up the mmap 
handles as aggressively as we would like. Thus, we need to introduce this 
change. I would recommend the following choices over adding complexity.
   1. Can we bump up the limit in the OS, the Datanode should stable state to a 
certain upper bound based on GC frequency and concurrency setting. 
   2. We can revert to non mmap handling if we hit the limit. 
   
   I agree with @duongkame the correctly solution might be to implement a new 
read API using netty and using the better abstractions it offers to manage 
direct buffers and NIO capabilities.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Commented] (HDDS-10750) Intermittent fork timeout while stopping Ratis server

2024-05-23 Thread Chung En Lee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-10750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849111#comment-17849111
 ] 

Chung En Lee commented on HDDS-10750:
-

[~adoroszlai] , I think it happens when closing LogAppender from NEW. I created 
a Jira issure RATIS-2100 for this.

> Intermittent fork timeout while stopping Ratis server
> -
>
> Key: HDDS-10750
> URL: https://issues.apache.org/jira/browse/HDDS-10750
> Project: Apache Ozone
>  Issue Type: Sub-task
>Reporter: Attila Doroszlai
>Priority: Critical
> Attachments: 2024-04-21T16-53-06_683-jvmRun1.dump, 
> 2024-05-03T11-31-12_561-jvmRun1.dump, 
> org.apache.hadoop.fs.ozone.TestOzoneFileChecksum-output.txt, 
> org.apache.hadoop.hdds.scm.TestSCMInstallSnapshot-output.txt, 
> org.apache.hadoop.ozone.client.rpc.TestECKeyOutputStreamWithZeroCopy-output.txt,
>  org.apache.hadoop.ozone.container.TestECContainerRecovery-output-1.txt, 
> org.apache.hadoop.ozone.container.TestECContainerRecovery-output.txt, 
> org.apache.hadoop.ozone.om.TestOzoneManagerPrepare-output.txt
>
>
> {code:title=https://github.com/adoroszlai/ozone-build-results/blob/master/2024/04/21/30803/it-client/output.log}
> [INFO] Running 
> org.apache.hadoop.ozone.client.rpc.TestECKeyOutputStreamWithZeroCopy
> [INFO] 
> [INFO] Results:
> ...
> ... There was a timeout or other error in the fork
> {code}
> {code}
> "main" 
>java.lang.Thread.State: WAITING
> at java.lang.Object.wait(Native Method)
> at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:405)
> ...
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopDatanodes(MiniOzoneClusterImpl.java:473)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stop(MiniOzoneClusterImpl.java:414)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.shutdown(MiniOzoneClusterImpl.java:400)
> at 
> org.apache.hadoop.ozone.client.rpc.AbstractTestECKeyOutputStream.shutdown(AbstractTestECKeyOutputStream.java:160)
> "ForkJoinPool.commonPool-worker-7" 
>java.lang.Thread.State: TIMED_WAITING
> ...
> at 
> java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
> at 
> org.apache.ratis.util.ConcurrentUtils.shutdownAndWait(ConcurrentUtils.java:144)
> at 
> org.apache.ratis.util.ConcurrentUtils.shutdownAndWait(ConcurrentUtils.java:136)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.lambda$close$9(RaftServerProxy.java:438)
> ...
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
> at 
> org.apache.ratis.server.impl.RaftServerProxy.close(RaftServerProxy.java:415)
> at 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis.stop(XceiverServerRatis.java:603)
> at 
> org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.stop(OzoneContainer.java:484)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.close(DatanodeStateMachine.java:447)
> at 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.stopDaemon(DatanodeStateMachine.java:637)
> at 
> org.apache.hadoop.ozone.HddsDatanodeService.stop(HddsDatanodeService.java:550)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl.stopDatanode(MiniOzoneClusterImpl.java:479)
> at 
> org.apache.hadoop.ozone.MiniOzoneClusterImpl$$Lambda$2077/645273703.accept(Unknown
>  Source)
> "c7edee5d-bf3c-45a7-a783-e11562f208dc-impl-thread2" 
>java.lang.Thread.State: WAITING
> ...
> at 
> java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1947)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.lambda$close$3(RaftServerImpl.java:543)
> at 
> org.apache.ratis.server.impl.RaftServerImpl$$Lambda$1925/263251010.run(Unknown
>  Source)
> at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$7(LifeCycle.java:306)
> at org.apache.ratis.util.LifeCycle$$Lambda$1204/655954062.get(Unknown 
> Source)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:326)
> at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:304)
> at 
> org.apache.ratis.server.impl.RaftServerImpl.close(RaftServerImpl.java:525)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10488. Datanode OOO due to run out of mmap handler [ozone]

2024-05-23 Thread via GitHub


duongkame commented on code in PR #6690:
URL: https://github.com/apache/ozone/pull/6690#discussion_r1612104239


##
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChunkBufferImplWithByteBuffer.java:
##
@@ -163,4 +175,12 @@ public String toString() {
 return getClass().getSimpleName() + ":limit=" + buffer.limit()
 + "@" + Integer.toHexString(hashCode());
   }
+
+  @Override
+  protected void finalize() throws Throwable {

Review Comment:
   Please see LeakDetector (HDDS-9528) @ChenSammi .
   Yet, relying on GC activity to perform resource closure may not be a good 
idea, because we have no control over when GC would cleanup unreachable 
objects. Usually people only reply on finalizer/WeakReference to detect leaks. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Resolved] (HDDS-10860) Intermittent failure in TestLeaseRecovery.testFinalizeBlockFailure

2024-05-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-10860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDDS-10860.

Fix Version/s: HDDS-7593
   Resolution: Fixed

> Intermittent failure in TestLeaseRecovery.testFinalizeBlockFailure
> --
>
> Key: HDDS-10860
> URL: https://issues.apache.org/jira/browse/HDDS-10860
> Project: Apache Ozone
>  Issue Type: Sub-task
>Reporter: Ashish Kumar
>Assignee: Ashish Kumar
>Priority: Major
>  Labels: pull-request-available
> Fix For: HDDS-7593
>
>
>  
> {code:java}
> Error:  Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 283.789 s <<< FAILURE! - in org.apache.hadoop.fs.ozone.TestLeaseRecovery
> 2292Error:  
> org.apache.hadoop.fs.ozone.TestLeaseRecovery.testFinalizeBlockFailure  Time 
> elapsed: 25.541 s  <<< FAILURE!
> 2293org.opentest4j.AssertionFailedError: expected:  but was: 
> 2294  at 
> org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
> 2295  at 
> org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
> 2296  at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
> 2297  at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
> 2298  at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:31)
> 2299  at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:183)
> 2300  at 
> org.apache.hadoop.fs.ozone.TestLeaseRecovery.testFinalizeBlockFailure(TestLeaseRecovery.java:277)
> 2301  at java.base/java.lang.reflect.Method.invoke(Method.java:568)
> 2302  at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
> 2303  at java.base/java.util.ArrayList.forEach(ArrayList.java:1511) {code}
> Failure link: 
> https://github.com/apache/ozone/actions/runs/9079532056/job/24949734316



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10860. Fix Intermittent failure in TestLeaseRecovery.testFinalizeBlockFailure [ozone]

2024-05-23 Thread via GitHub


jojochuang merged PR #6707:
URL: https://github.com/apache/ozone/pull/6707


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10905. Implement getHomeDirectory() method in OzoneFileSystem implementations [ozone]

2024-05-23 Thread via GitHub


adoroszlai commented on code in PR #6718:
URL: https://github.com/apache/ozone/pull/6718#discussion_r1612112237


##
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/AbstractRootedOzoneFileSystemTest.java:
##
@@ -299,6 +299,11 @@ void testOzoneFsServiceLoader() throws IOException {
 OzoneConsts.OZONE_OFS_URI_SCHEME, confTestLoader), 
RootedOzoneFileSystem.class);
   }
 
+  @Test
+  void testUserHomeDirectory() {
+assertEquals(new Path("/user/" + USER1), userOfs.getHomeDirectory());

Review Comment:
   And similarly:
   
   ```
   expected:  but was: 

   ```
   
   
https://github.com/SaketaChalamchala/ozone/actions/runs/9212326470/job/25344354470#step:5:2184



##
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/fs/ozone/AbstractOzoneFileSystemTest.java:
##
@@ -258,6 +267,11 @@ public BucketLayout getBucketLayout() {
 return bucketLayout;
   }
 
+  @Test
+  void testUserHomeDirectory() {
+assertEquals(new Path("/user/" + USER1), userO3fs.getHomeDirectory());

Review Comment:
   Looks like these are failing with:
   
   ```
   expected:  but was: 

   ```
   
   
https://github.com/SaketaChalamchala/ozone/actions/runs/9212326470/job/25344354470#step:5:2216



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10634. Recon - listKeys API for listing keys with optional filters [ozone]

2024-05-23 Thread via GitHub


devmadhuu commented on code in PR #6658:
URL: https://github.com/apache/ozone/pull/6658#discussion_r1612146507


##
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/OMDBInsightEndpoint.java:
##
@@ -674,6 +685,720 @@ public Response getDeletedDirectorySummary() {
 return Response.ok(dirSummary).build();
   }
 
+  /**
+   * This API will list out limited 'count' number of keys after applying 
below filters in API parameters:
+   * Default Values of API param filters:
+   *-- replicationType - empty string and filter will not be applied, so 
list out all keys irrespective of
+   *   replication type.
+   *-- creationTime - empty string and filter will not be applied, so list 
out keys irrespective of age.
+   *-- keySize - 0 bytes, which means all keys greater than zero bytes 
will be listed, effectively all.
+   *-- startPrefix - /, API assumes that startPrefix path always starts 
with /. E.g. /volume/bucket
+   *-- prevKey - ""
+   *-- limit - 1000
+   *
+   * @param replicationType Filter for RATIS or EC replication keys
+   * @param creationDate Filter for keys created after creationDate in 
"MM-dd- HH:mm:ss" string format.
+   * @param keySize Filter for Keys greater than keySize in bytes.
+   * @param startPrefix Filter for startPrefix path.
+   * @param prevKey rocksDB last key of page requested.
+   * @param limit Filter for limited count of keys.
+   *
+   * @return the list of keys in JSON structured format as per respective 
bucket layout.
+   *
+   * Now lets consider, we have following OBS, LEGACY and FSO bucket key/files 
namespace tree structure
+   *
+   * For OBS Bucket
+   *
+   * /volume1/obs-bucket/key1
+   * /volume1/obs-bucket/key1/key2
+   * /volume1/obs-bucket/key1/key2/key3
+   * /volume1/obs-bucket/key4
+   * /volume1/obs-bucket/key5
+   * /volume1/obs-bucket/key6
+   * For LEGACY Bucket
+   *
+   * /volume1/legacy-bucket/key
+   * /volume1/legacy-bucket/key1/key2
+   * /volume1/legacy-bucket/key1/key2/key3
+   * /volume1/legacy-bucket/key4
+   * /volume1/legacy-bucket/key5
+   * /volume1/legacy-bucket/key6
+   * For FSO Bucket
+   *
+   * /volume1/fso-bucket/dir1/dir2/dir3
+   * /volume1/fso-bucket/dir1/testfile
+   * /volume1/fso-bucket/dir1/file1
+   * /volume1/fso-bucket/dir1/dir2/testfile
+   * /volume1/fso-bucket/dir1/dir2/file1
+   * /volume1/fso-bucket/dir1/dir2/dir3/testfile
+   * /volume1/fso-bucket/dir1/dir2/dir3/file1
+   * Input Request for OBS bucket:
+   *
+   *
`api/v1/keys/listKeys?startPrefix=/volume1/obs-bucket&limit=2&replicationType=RATIS`
+   * Output Response:
+   *
+   * {
+   * "status": "OK",
+   * "path": "/volume1/obs-bucket",
+   * "replicatedDataSize": 62914560,
+   * "unReplicatedDataSize": 62914560,
+   * "lastKey": "/volume1/obs-bucket/key6",
+   * "keys": [
+   * {
+   * "key": "/volume1/obs-bucket/key1",
+   * "path": "volume1/obs-bucket/key1",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781418742,
+   * "modificationTime": 1715781419762,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2",
+   * "path": "volume1/obs-bucket/key1/key2",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781421716,
+   * "modificationTime": 1715781422723,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2/key3",
+   * "path": "volume1/obs-bucket/key1/key2/key3",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781424718,
+   * "modificationTime": 1715781425598,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key4",
+   * "path": "volume1/obs-bucket/key4",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime":

Re: [PR] HDDS-10634. Recon - listKeys API for listing keys with optional filters [ozone]

2024-05-23 Thread via GitHub


devmadhuu commented on code in PR #6658:
URL: https://github.com/apache/ozone/pull/6658#discussion_r1612144375


##
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/OMDBInsightEndpoint.java:
##
@@ -674,6 +686,625 @@ public Response getDeletedDirectorySummary() {
 return Response.ok(dirSummary).build();
   }
 
+  /**
+   * This API will list out limited 'count' number of keys after applying 
below filters in API parameters:
+   * Default Values of API param filters:
+   *-- replicationType - empty string and filter will not be applied, so 
list out all keys irrespective of
+   *   replication type.
+   *-- creationTime - empty string and filter will not be applied, so list 
out keys irrespective of age.
+   *-- keySize - 0 bytes, which means all keys greater than zero bytes 
will be listed, effectively all.
+   *-- startPrefix - /, API assumes that startPrefix path always starts 
with /. E.g. /volume/bucket
+   *-- prevKey - ""
+   *-- limit - 1000
+   *
+   * @param replicationType Filter for RATIS or EC replication keys
+   * @param creationDate Filter for keys created after creationDate in 
"MM-dd- HH:mm:ss" string format.
+   * @param keySize Filter for Keys greater than keySize in bytes.
+   * @param startPrefix Filter for startPrefix path.
+   * @param prevKey rocksDB last key of page requested.
+   * @param limit Filter for limited count of keys.
+   *
+   * @return the list of keys in JSON structured format as per respective 
bucket layout.
+   *
+   * Now lets consider, we have following OBS, LEGACY and FSO bucket key/files 
namespace tree structure
+   *
+   * For OBS Bucket
+   *
+   * /volume1/obs-bucket/key1
+   * /volume1/obs-bucket/key1/key2
+   * /volume1/obs-bucket/key1/key2/key3
+   * /volume1/obs-bucket/key4
+   * /volume1/obs-bucket/key5
+   * /volume1/obs-bucket/key6
+   * For LEGACY Bucket
+   *
+   * /volume1/legacy-bucket/key
+   * /volume1/legacy-bucket/key1/key2
+   * /volume1/legacy-bucket/key1/key2/key3
+   * /volume1/legacy-bucket/key4
+   * /volume1/legacy-bucket/key5
+   * /volume1/legacy-bucket/key6
+   * For FSO Bucket
+   *
+   * /volume1/fso-bucket/dir1/dir2/dir3
+   * /volume1/fso-bucket/dir1/testfile
+   * /volume1/fso-bucket/dir1/file1
+   * /volume1/fso-bucket/dir1/dir2/testfile
+   * /volume1/fso-bucket/dir1/dir2/file1
+   * /volume1/fso-bucket/dir1/dir2/dir3/testfile
+   * /volume1/fso-bucket/dir1/dir2/dir3/file1
+   * Input Request for OBS bucket:
+   *
+   *
`api/v1/keys/listKeys?startPrefix=/volume1/obs-bucket&limit=2&replicationType=RATIS`
+   * Output Response:
+   *
+   * {
+   * "status": "OK",
+   * "path": "/volume1/obs-bucket",
+   * "replicatedDataSize": 62914560,
+   * "unReplicatedDataSize": 62914560,
+   * "lastKey": "/volume1/obs-bucket/key6",
+   * "keys": [
+   * {
+   * "key": "/volume1/obs-bucket/key1",
+   * "path": "volume1/obs-bucket/key1",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781418742,
+   * "modificationTime": 1715781419762,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2",
+   * "path": "volume1/obs-bucket/key1/key2",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781421716,
+   * "modificationTime": 1715781422723,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2/key3",
+   * "path": "volume1/obs-bucket/key1/key2/key3",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781424718,
+   * "modificationTime": 1715781425598,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key4",
+   * "path": "volume1/obs-bucket/key4",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime":

[jira] [Resolved] (HDDS-8371) A keyName field in the keyTable might contain a full path for the key instead of the file name

2024-05-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDDS-8371.
---
Resolution: Duplicate

I'll go ahead resolve this jira because it was fixed by HDDS-8292.

> A keyName field in the keyTable might contain a full path for the key instead 
> of the file name
> --
>
> Key: HDDS-8371
> URL: https://issues.apache.org/jira/browse/HDDS-8371
> Project: Apache Ozone
>  Issue Type: Bug
>  Components: OM
>Affects Versions: 1.3.0
>Reporter: Kohei Sugihara
>Priority: Major
>  Labels: pull-request-available
>
> The listStatus API serves a repeated path in the list when a path for the key 
> is deep. We noticed the listStatus API serves a corrupt result against some 
> specific keys in a bucket. The corruption is that repeats a requested key 
> prefix in a final list of the listStatus result like the following:
> {code:java}
> # expected case
> % aws s3 ls s3://bucket/a/b/c/d/e/f/g/file.zip
>   a/b/c/d/e/f/g/file.zip
> ...
> # actual: "a/b/c/d/e/f/g" is duplicated
> % aws s3 ls s3://bucket/a/b/c/d/e/f/g/file.zip
>   a/b/c/d/e/f/g/a/b/c/d/e/f/g/file.zip
> ... {code}
> Environment:
>  * Ozone 1.3 
> [compatible|https://github.com/apache/ozone/commit/9c61a8aa497ab96c014ad3bb7b1ee4f731ebfaf8]
>  version (same environment as HDDS-7701, HDDS-7925)
>  * Several large files, all of them are uploaded by multipart using AWS-CLI, 
> divided into 8 MB chunks
>  * An FSO-enabled bucket
>  * OM HA
> Problem Details:
> I've dug the OM DB and found metadata in the keyTable has the full path for 
> the key, so it finally appears redundant prefix twice in the result of the 
> listStatus API.
> {code:java}
> # dump keyTable entries
> while (keyIter.hasNext()) {
>       Table.KeyValue kv = keyIter.next();
>       OmKeyInfo v = kv.getValue();
>       LOG.info("v/b={}/{} parent={} Key={} size={} time={} checksum={} id={} 
> keyName={}",
>               kv.getValue().getVolumeName(), kv.getValue().getBucketName(),
>               nodeId, kv.getValue().getFileName(),
>               kv.getValue().getDataSize(), kv.getValue().getCreationTime(), 
> kv.getValue().getFileChecksum(), kv.getValue().getObjectID(), 
> kv.getValue().getKeyName());
> }
> # keyName has a full path for the key
> - v/b=volume/bucket parent=-9223371931475161596 Key=0g0pustv.tar.gz 
> size=2276021708 time=1672052153340 checksum=null id=-9223371931457566207 
> keyName=a/b/c/d/e/0g0pustv.tar.gz
> - v/b=volume/bucket parent=-9223371931475161596 Key=0g0pustv.zip 
> size=2333733892 time=167205395 checksum=null id=-9223371931408929023 
> keyName=a/b/c/d/e/0g0pustv.zip
> - v/b=volume/bucket parent=-9223371931475161596 Key=0nh5ww00.tar.gz 
> size=249321741 time=1672052233487 checksum=null id=-9223371931393057791 
> keyName=a/b/c/d/e/0nh5ww00.tar.gz
> - v/b=volume/bucket parent=-9223371931475161596 Key=0nh5ww00.zip 
> size=255764877 time=1672052242830 checksum=null id=-9223371931388326655 
> keyName=a/b/c/d/e/0nh5ww00.zip
> - v/b=volume/bucket parent=-9223371931475161596 Key=5b2uha1h.tar.gz 
> size=2276346612 time=1672052331175 checksum=null id=-9223371931348859135 
> keyName=a/b/c/d/e/5b2uha1h.tar.gz
> ...
> # other keys which have the same parent do not have their prefix in the key
> - v/b=volume/bucket parent=-9223371931475161596 Key=kh7vbwlh.zip 
> size=573797127 time=1672052273970 checksum=null id=-9223371931375503871 
> keyName=kh7vbwlh.zip
> - v/b=volume/bucket parent=-9223371931475161596 Key=ngaxsd8c.tar.gz 
> size=380094900 time=1672052284433 checksum=null id=-9223371931368669695 
> keyName=ngaxsd8c.tar.gz
> - v/b=volume/bucket parent=-9223371931475161596 Key=ngaxsd8c.zip 
> size=393085953 time=1672052099618 checksum=null id=-9223371931473057023 
> keyName=ngaxsd8c.zip
> - v/b=volume/bucket parent=-9223371931475161596 Key=nrou31c3.tar.gz 
> size=568466718 time=1672052124043 checksum=null id=-9223371931461502975 
> keyName=nrou31c3.tar.gz
> - v/b=volume/bucket parent=-9223371931475161596 Key=nrou31c3.zip 
> size=574807485 time=1672052149947 checksum=null id=-9223371931446918911 
> keyName=nrou31c3.zip
> - v/b=volume/bucket parent=-9223371931475161596 Key=ol8dhbqo.tar.gz 
> size=555722830 time=1672052168904 checksum=null id=-9223371931435349759 
> keyName=ol8dhbqo.tar.gz {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10488. Datanode OOO due to run out of mmap handler [ozone]

2024-05-23 Thread via GitHub


duongkame commented on code in PR #6690:
URL: https://github.com/apache/ozone/pull/6690#discussion_r1612123943


##
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChunkBufferImplWithByteBuffer.java:
##
@@ -163,4 +175,12 @@ public String toString() {
 return getClass().getSimpleName() + ":limit=" + buffer.limit()
 + "@" + Integer.toHexString(hashCode());
   }
+
+  @Override
+  protected void finalize() throws Throwable {

Review Comment:
   GRPC is known to have some limits in the outbound:
1. It doesn't provide any callbacks or events to notify it successfully 
sends a response. (unlike when sending a request, you can listen to the 
response observer).
2. Even if the buffer being sent is a direct one (or mapped buffer), grpc 
will not do zero-copy when sending it as a response. It will do another copy to 
re-frame the original buffer. The copy is done through a reused heap buffer. 
   
   This PR may be very out of scope (of this PR discussion), but GRPC is not 
the right tool for the job. And we will keep making unnatural efforts to cope 
with it (like we're doing). 
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10488. Datanode OOO due to run out of mmap handler [ozone]

2024-05-23 Thread via GitHub


szetszwo commented on PR #6690:
URL: https://github.com/apache/ozone/pull/6690#issuecomment-2127749473

   @ChenSammi , please don't implement our own `MappedBufferManager`.  Netty 
already has a very good buffer management.  Let's simply use it; see 
[HDDS-7188](https://issues.apache.org/jira/browse/HDDS-7188) ( 
https://github.com/apache/ozone/pull/3730 ).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Updated] (HDDS-10908) Increase DataNode XceiverServerGrpc event loop group size

2024-05-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-10908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-10908:
---
Description: 
The current configuration has the XceiverServerGrpc boss and worker event loop 
group share the same thread pool whose size is number of volumes * 
hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool size 
number of volumes * hdds.datanode.read.chunk.threads.per.volume.

The event loop group thread pool size is too small. Assuming single volume that 
implies just one thread shared between boss/worker.

Using freon DN Echo tool I found increasing the pool size slightly 
significantly increases throughput:

{noformat}
sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n 1000 
--sleep-time-ms=0 --read-only

hdds.datanode.read.chunk.threads.per.volume = 10 (default):
 mean rate = 44125.45 calls/second

hdds.datanode.read.chunk.threads.per.volume = 20:
 mean rate = 61322.60 calls/second

hdds.datanode.read.chunk.threads.per.volume = 40:
 mean rate = 77951.91 calls/second

hdds.datanode.read.chunk.threads.per.volume = 100:
 mean rate = 65573.07 calls/second

hdds.datanode.read.chunk.threads.per.volume = 1000:
 mean rate = 25079.32 calls/second

{noformat}

So it appears that increasing the default value to 40 has positive impact. Or 
we should consider don't associate the thread pool size with number of volumes. 
Otherwise the number becomes too big for say 48 disks.

Note: 
DN echo in Ratis read only mode is about 83k requests per second on the same 
host.
OM echo in read only mode is about 38k requests per second.

  was:
The current configuration has the XceiverServerGrpc boss and worker event loop 
group share the same thread pool whose size is number of volumes * 
hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool size 
number of volumes * hdds.datanode.read.chunk.threads.per.volume.

The event loop group thread pool size is too small. Assuming single volume that 
implies just one thread shared between boss/worker.

Using freon DN Echo tool I found increasing the pool size slightly 
significantly increases throughput:

{noformat}
sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n 1000 
--sleep-time-ms=0 --read-only

hdds.datanode.read.chunk.threads.per.volume = 10 (default):
 mean rate = 44125.45 calls/second

hdds.datanode.read.chunk.threads.per.volume = 20:
 mean rate = 61322.60 calls/second

hdds.datanode.read.chunk.threads.per.volume = 40:
 mean rate = 77951.91 calls/second

hdds.datanode.read.chunk.threads.per.volume = 100:
 mean rate = 65573.07 calls/second

hdds.datanode.read.chunk.threads.per.volume = 1000:
 mean rate = 25079.32 calls/second

{noformat}

So it appears that increasing the default value to 40 has positive impact. Or 
we should consider don't associate the thread pool size with number of volumes.

Note: 
DN echo in Ratis read only mode is about 83k requests per second on the same 
host.
OM echo in read only mode is about 38k requests per second.


> Increase DataNode XceiverServerGrpc event loop group size
> -
>
> Key: HDDS-10908
> URL: https://issues.apache.org/jira/browse/HDDS-10908
> Project: Apache Ozone
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> The current configuration has the XceiverServerGrpc boss and worker event 
> loop group share the same thread pool whose size is number of volumes * 
> hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool 
> size number of volumes * hdds.datanode.read.chunk.threads.per.volume.
> The event loop group thread pool size is too small. Assuming single volume 
> that implies just one thread shared between boss/worker.
> Using freon DN Echo tool I found increasing the pool size slightly 
> significantly increases throughput:
> {noformat}
> sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n 
> 1000 --sleep-time-ms=0 --read-only
> hdds.datanode.read.chunk.threads.per.volume = 10 (default):
>  mean rate = 44125.45 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 20:
>  mean rate = 61322.60 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 40:
>  mean rate = 77951.91 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 100:
>  mean rate = 65573.07 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 1000:
>  mean rate = 25079.32 calls/second
> {noformat}
> So it appears that increasing the default value to 40 has positive impact. Or 
> we should consider don't associate the thread pool size with number of 
> volumes. Ot

[jira] [Assigned] (HDDS-10908) Increase DataNode XceiverServerGrpc event loop group size

2024-05-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-10908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HDDS-10908:
--

Assignee: Wei-Chiu Chuang

> Increase DataNode XceiverServerGrpc event loop group size
> -
>
> Key: HDDS-10908
> URL: https://issues.apache.org/jira/browse/HDDS-10908
> Project: Apache Ozone
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> The current configuration has the XceiverServerGrpc boss and worker event 
> loop group share the same thread pool whose size is number of volumes * 
> hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool 
> size number of volumes * hdds.datanode.read.chunk.threads.per.volume.
> The event loop group thread pool size is too small. Assuming single volume 
> that implies just one thread shared between boss/worker.
> Using freon DN Echo tool I found increasing the pool size slightly 
> significantly increases throughput:
> {noformat}
> sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n 
> 1000 --sleep-time-ms=0 --read-only
> hdds.datanode.read.chunk.threads.per.volume = 10 (default):
>  mean rate = 44125.45 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 20:
>  mean rate = 61322.60 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 40:
>  mean rate = 77951.91 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 100:
>  mean rate = 65573.07 calls/second
> hdds.datanode.read.chunk.threads.per.volume = 1000:
>  mean rate = 25079.32 calls/second
> {noformat}
> So it appears that increasing the default value to 40 has positive impact. Or 
> we should consider don't associate the thread pool size with number of 
> volumes.
> Note: 
> DN echo in Ratis read only mode is about 83k requests per second on the same 
> host.
> OM echo in read only mode is about 38k requests per second.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Created] (HDDS-10908) Increase DataNode XceiverServerGrpc event loop group size

2024-05-23 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HDDS-10908:
--

 Summary: Increase DataNode XceiverServerGrpc event loop group size
 Key: HDDS-10908
 URL: https://issues.apache.org/jira/browse/HDDS-10908
 Project: Apache Ozone
  Issue Type: Improvement
  Components: Ozone Datanode
Reporter: Wei-Chiu Chuang


The current configuration has the XceiverServerGrpc boss and worker event loop 
group share the same thread pool whose size is number of volumes * 
hdds.datanode.read.chunk.threads.per.volume / 10, and executor thread pool size 
number of volumes * hdds.datanode.read.chunk.threads.per.volume.

The event loop group thread pool size is too small. Assuming single volume that 
implies just one thread shared between boss/worker.

Using freon DN Echo tool I found increasing the pool size slightly 
significantly increases throughput:

{noformat}
sudo -u hdfs ozone freon dne --clients=32 --container-id=1001 -t 32 -n 1000 
--sleep-time-ms=0 --read-only

hdds.datanode.read.chunk.threads.per.volume = 10 (default):
 mean rate = 44125.45 calls/second

hdds.datanode.read.chunk.threads.per.volume = 20:
 mean rate = 61322.60 calls/second

hdds.datanode.read.chunk.threads.per.volume = 40:
 mean rate = 77951.91 calls/second

hdds.datanode.read.chunk.threads.per.volume = 100:
 mean rate = 65573.07 calls/second

hdds.datanode.read.chunk.threads.per.volume = 1000:
 mean rate = 25079.32 calls/second

{noformat}

So it appears that increasing the default value to 40 has positive impact. Or 
we should consider don't associate the thread pool size with number of volumes.

Note: 
DN echo in Ratis read only mode is about 83k requests per second on the same 
host.
OM echo in read only mode is about 38k requests per second.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10488. Datanode OOO due to run out of mmap handler [ozone]

2024-05-23 Thread via GitHub


duongkame commented on code in PR #6690:
URL: https://github.com/apache/ozone/pull/6690#discussion_r1612104239


##
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChunkBufferImplWithByteBuffer.java:
##
@@ -163,4 +175,12 @@ public String toString() {
 return getClass().getSimpleName() + ":limit=" + buffer.limit()
 + "@" + Integer.toHexString(hashCode());
   }
+
+  @Override
+  protected void finalize() throws Throwable {

Review Comment:
   Please see LeakDetector (HDDS-9528) @ChenSammi 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Updated] (HDDS-10896) Refactor PerformanceMetrics creation

2024-05-23 Thread Attila Doroszlai (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-10896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-10896:

Fix Version/s: 1.5.0
   Resolution: Implemented
   Status: Resolved  (was: Patch Available)

> Refactor PerformanceMetrics creation
> 
>
> Key: HDDS-10896
> URL: https://issues.apache.org/jira/browse/HDDS-10896
> Project: Apache Ozone
>  Issue Type: Sub-task
>  Components: common
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.5.0
>
>
> Move creation of mutable metrics objects to {{PerformanceMetrics}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10896. Refactor PerformanceMetrics creation [ozone]

2024-05-23 Thread via GitHub


adoroszlai commented on PR #6712:
URL: https://github.com/apache/ozone/pull/6712#issuecomment-2127686976

   Thanks @xichen01 for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10896. Refactor PerformanceMetrics creation [ozone]

2024-05-23 Thread via GitHub


adoroszlai merged PR #6712:
URL: https://github.com/apache/ozone/pull/6712


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Resolved] (HDDS-10831) Customize grpc control flow window size

2024-05-23 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-10831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDDS-10831.

Resolution: Won't Fix

control flow window is set to 5MB on ratis server side.
As far as I can tell there's no difference between 1MB and 5MB window size, so 
I'll resolve this one as won't fix for now.

> Customize grpc control flow window size
> ---
>
> Key: HDDS-10831
> URL: https://issues.apache.org/jira/browse/HDDS-10831
> Project: Apache Ozone
>  Issue Type: Improvement
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> HDDS-2990 discovered that having a larger flow control windows (grpc default 
> 1MB) improves throughput performance, and suggested a window size of 5MB.
> However, our grpc/ratis code never used the configuration value implemented 
> by HDDS-2990 so we are still relying on the default value, which is not 
> optimal.
> Open this Jira to use the customized control flow window size.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10239. Storage Container Reconciliation. [ozone]

2024-05-23 Thread via GitHub


kerneltime merged PR #6121:
URL: https://github.com/apache/ozone/pull/6121


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10239. [Reconciliation] Initial proto file for Merkle Tree [ozone]

2024-05-23 Thread via GitHub


kerneltime closed pull request #6302: HDDS-10239. [Reconciliation] Initial 
proto file for Merkle Tree
URL: https://github.com/apache/ozone/pull/6302


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10239. [Reconciliation] Initial proto file for Merkle Tree [ozone]

2024-05-23 Thread via GitHub


kerneltime commented on PR #6302:
URL: https://github.com/apache/ozone/pull/6302#issuecomment-2127609298

   The proto changes are being merged in 
https://github.com/apache/ozone/pull/6708


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10906. [hsync] 6th merge from master [ozone]

2024-05-23 Thread via GitHub


jojochuang commented on PR #6720:
URL: https://github.com/apache/ozone/pull/6720#issuecomment-2127377315

   wrong base


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10906. [hsync] 6th merge from master [ozone]

2024-05-23 Thread via GitHub


jojochuang commented on PR #6720:
URL: https://github.com/apache/ozone/pull/6720#issuecomment-2127375682

   cc @chungen0126 because both conflicts are associated with your PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Updated] (HDDS-10906) [hsync] 6th merge from master

2024-05-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-10906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-10906:
--
Labels: pull-request-available  (was: )

> [hsync] 6th merge from master
> -
>
> Key: HDDS-10906
> URL: https://issues.apache.org/jira/browse/HDDS-10906
> Project: Apache Ozone
>  Issue Type: Sub-task
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume [ozone]

2024-05-23 Thread via GitHub


smitajoshi12 commented on code in PR #6535:
URL: https://github.com/apache/ozone/pull/6535#discussion_r1608515878


##
hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/src/views/diskUsage/diskUsage.tsx:
##
@@ -144,20 +145,19 @@ export class DiskUsage extends 
React.Component, IDUState>
   const dataSize = duResponse.size;
   let subpaths: IDUSubpath[] = duResponse.subPaths;
 
-  subpaths.sort((a, b) => (a.size < b.size) ? 1 : -1);
-
   // Only show top n blocks with the most DU,
   // other blocks are merged as a single block
-  if (subpaths.length > limit) {
+  if (subpaths.length > limit || (subpaths.length > 0 && limit === 
MAX_DISPLAY_LIMIT)) {
 subpaths = subpaths.slice(0, limit);
 let topSize = 0;
-for (let i = 0; i < limit; ++i) {
+for (let i = 0; limit === MAX_DISPLAY_LIMIT ? i < subpaths.length : i 
< limit; ++i) {

Review Comment:
   @dombizita 
   
   Done testing with for (let i = 0; i < subpaths.length; ++i)  it is working 
for all test cases changed comment section for other Size Object in latest 
commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-9626. [Recon] Disk Usage page with high number of key/bucket/volume [ozone]

2024-05-23 Thread via GitHub


smitajoshi12 commented on code in PR #6535:
URL: https://github.com/apache/ozone/pull/6535#discussion_r1608489889


##
hadoop-ozone/recon/src/main/resources/webapps/recon/ozone-recon-web/src/views/diskUsage/diskUsage.tsx:
##
@@ -265,8 +265,8 @@ export class DiskUsage extends 
React.Component, IDUState>
 
   updateDisplayLimit(e): void {
 let res = -1;
-if (e.key === 'all') {
-  res = Number.MAX_VALUE;
+if (e.key === '30') {
+  res = Number.parseInt(e.key, 10);
 } else {
   res = Number.parseInt(e.key, 10);
 }

Review Comment:
   @dombizita 
Removed duplicate  Number.parseInt(e.key, 10) in latest commit.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10862. Reduce time spent on initializing metrics during OM start [ozone]

2024-05-23 Thread via GitHub


mango-li commented on PR #6682:
URL: https://github.com/apache/ozone/pull/6682#issuecomment-2127188647

   > The volume table and bucket table use FullTableCache, 
CountEstimatedRowsInTable is actually the exact number of rows in the table. 
Let's keep it synchronized.
   > 
   > ```
   > 
Metrics.setNumVolumes(metadataManager.countEstimatedRowsInTable(metadataManager.getvolumeTable()));
   > 
Metrics.setNumBuckets(metadataManager.countEstimatedRowsInTable(metadataManager.getBucketTable()));
   > ```
   
   Thank you for the review. I have updated the code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10862. Reduce time spent on initializing metrics during OM start [ozone]

2024-05-23 Thread via GitHub


mango-li commented on PR #6682:
URL: https://github.com/apache/ozone/pull/6682#issuecomment-2127181738

   > > The time saved is mainly from 
`metadataManager.countRowsInTable(metadataManager.getVolumeTable())` and 
`metadataManager.countRowsInTable(metadataManager.getBucketTable())`
   > 
   > Volume and bucket tables are fully cached, so we could get row count from 
cache size by calling:
   > 
   > 
https://github.com/apache/ozone/blob/6f30f2fc2214744fa481f3bf3f96bc301557ff15/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/TypedTable.java#L453-L456
   > 
   > This could save time by avoiding iteration and key/value conversion.
   
   Good idea! I have changed `metrics.numVolumes` and `metrics.numBuckets` 
using `metadataManager.countEstimatedRowsInTable`. And still keep om metrics 
set synchronized.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10862. Reduce time spent on initializing metrics during OM start [ozone]

2024-05-23 Thread via GitHub


guohao-rosicky commented on PR #6682:
URL: https://github.com/apache/ozone/pull/6682#issuecomment-2127042051

   The volume table and bucket table use FullTableCache, 
CountEstimatedRowsInTable is actually the exact number of rows in the table. 
Let's keep it synchronized.
   
   ```
   
Metrics.setNumVolumes(metadataManager.countEstimatedRowsInTable(metadataManager.getvolumeTable()));
   
Metrics.setNumBuckets(metadataManager.countEstimatedRowsInTable(metadataManager.getBucketTable()));
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10634. Recon - listKeys API for listing keys with optional filters [ozone]

2024-05-23 Thread via GitHub


devmadhuu commented on code in PR #6658:
URL: https://github.com/apache/ozone/pull/6658#discussion_r1611521963


##
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/OMDBInsightEndpoint.java:
##
@@ -674,6 +685,624 @@ public Response getDeletedDirectorySummary() {
 return Response.ok(dirSummary).build();
   }
 
+  /**
+   * This API will list out limited 'count' number of keys after applying 
below filters in API parameters:
+   * Default Values of API param filters:
+   *-- replicationType - empty string and filter will not be applied, so 
list out all keys irrespective of
+   *   replication type.
+   *-- creationTime - empty string and filter will not be applied, so list 
out keys irrespective of age.
+   *-- keySize - 0 bytes, which means all keys greater than zero bytes 
will be listed, effectively all.
+   *-- startPrefix - /, API assumes that startPrefix path always starts 
with /. E.g. /volume/bucket
+   *-- prevKey - ""
+   *-- limit - 1000
+   *
+   * @param replicationType Filter for RATIS or EC replication keys
+   * @param creationDate Filter for keys created after creationDate in 
"MM-dd- HH:mm:ss" string format.
+   * @param keySize Filter for Keys greater than keySize in bytes.
+   * @param startPrefix Filter for startPrefix path.
+   * @param prevKey rocksDB last key of page requested.
+   * @param limit Filter for limited count of keys.
+   *
+   * @return the list of keys in JSON structured format as per respective 
bucket layout.
+   *
+   * Now lets consider, we have following OBS, LEGACY and FSO bucket key/files 
namespace tree structure
+   *
+   * For OBS Bucket
+   *
+   * /volume1/obs-bucket/key1
+   * /volume1/obs-bucket/key1/key2
+   * /volume1/obs-bucket/key1/key2/key3
+   * /volume1/obs-bucket/key4
+   * /volume1/obs-bucket/key5
+   * /volume1/obs-bucket/key6
+   * For LEGACY Bucket
+   *
+   * /volume1/legacy-bucket/key
+   * /volume1/legacy-bucket/key1/key2
+   * /volume1/legacy-bucket/key1/key2/key3
+   * /volume1/legacy-bucket/key4
+   * /volume1/legacy-bucket/key5
+   * /volume1/legacy-bucket/key6
+   * For FSO Bucket
+   *
+   * /volume1/fso-bucket/dir1/dir2/dir3
+   * /volume1/fso-bucket/dir1/testfile
+   * /volume1/fso-bucket/dir1/file1
+   * /volume1/fso-bucket/dir1/dir2/testfile
+   * /volume1/fso-bucket/dir1/dir2/file1
+   * /volume1/fso-bucket/dir1/dir2/dir3/testfile
+   * /volume1/fso-bucket/dir1/dir2/dir3/file1
+   * Input Request for OBS bucket:
+   *
+   *
`api/v1/keys/listKeys?startPrefix=/volume1/obs-bucket&limit=2&replicationType=RATIS`
+   * Output Response:
+   *
+   * {
+   * "status": "OK",
+   * "path": "/volume1/obs-bucket",
+   * "replicatedDataSize": 62914560,
+   * "unReplicatedDataSize": 62914560,
+   * "lastKey": "/volume1/obs-bucket/key6",
+   * "keys": [
+   * {
+   * "key": "/volume1/obs-bucket/key1",
+   * "path": "volume1/obs-bucket/key1",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781418742,
+   * "modificationTime": 1715781419762,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2",
+   * "path": "volume1/obs-bucket/key1/key2",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781421716,
+   * "modificationTime": 1715781422723,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2/key3",
+   * "path": "volume1/obs-bucket/key1/key2/key3",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781424718,
+   * "modificationTime": 1715781425598,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key4",
+   * "path": "volume1/obs-bucket/key4",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime":

Re: [PR] HDDS-10634. Recon - listKeys API for listing keys with optional filters [ozone]

2024-05-23 Thread via GitHub


devmadhuu commented on code in PR #6658:
URL: https://github.com/apache/ozone/pull/6658#discussion_r1611497309


##
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/OMDBInsightEndpoint.java:
##
@@ -674,6 +685,624 @@ public Response getDeletedDirectorySummary() {
 return Response.ok(dirSummary).build();
   }
 
+  /**
+   * This API will list out limited 'count' number of keys after applying 
below filters in API parameters:
+   * Default Values of API param filters:
+   *-- replicationType - empty string and filter will not be applied, so 
list out all keys irrespective of
+   *   replication type.
+   *-- creationTime - empty string and filter will not be applied, so list 
out keys irrespective of age.
+   *-- keySize - 0 bytes, which means all keys greater than zero bytes 
will be listed, effectively all.
+   *-- startPrefix - /, API assumes that startPrefix path always starts 
with /. E.g. /volume/bucket
+   *-- prevKey - ""
+   *-- limit - 1000
+   *
+   * @param replicationType Filter for RATIS or EC replication keys
+   * @param creationDate Filter for keys created after creationDate in 
"MM-dd- HH:mm:ss" string format.
+   * @param keySize Filter for Keys greater than keySize in bytes.
+   * @param startPrefix Filter for startPrefix path.
+   * @param prevKey rocksDB last key of page requested.
+   * @param limit Filter for limited count of keys.
+   *
+   * @return the list of keys in JSON structured format as per respective 
bucket layout.
+   *
+   * Now lets consider, we have following OBS, LEGACY and FSO bucket key/files 
namespace tree structure
+   *
+   * For OBS Bucket
+   *
+   * /volume1/obs-bucket/key1
+   * /volume1/obs-bucket/key1/key2
+   * /volume1/obs-bucket/key1/key2/key3
+   * /volume1/obs-bucket/key4
+   * /volume1/obs-bucket/key5
+   * /volume1/obs-bucket/key6
+   * For LEGACY Bucket
+   *
+   * /volume1/legacy-bucket/key
+   * /volume1/legacy-bucket/key1/key2
+   * /volume1/legacy-bucket/key1/key2/key3
+   * /volume1/legacy-bucket/key4
+   * /volume1/legacy-bucket/key5
+   * /volume1/legacy-bucket/key6
+   * For FSO Bucket
+   *
+   * /volume1/fso-bucket/dir1/dir2/dir3
+   * /volume1/fso-bucket/dir1/testfile
+   * /volume1/fso-bucket/dir1/file1
+   * /volume1/fso-bucket/dir1/dir2/testfile
+   * /volume1/fso-bucket/dir1/dir2/file1
+   * /volume1/fso-bucket/dir1/dir2/dir3/testfile
+   * /volume1/fso-bucket/dir1/dir2/dir3/file1
+   * Input Request for OBS bucket:
+   *
+   *
`api/v1/keys/listKeys?startPrefix=/volume1/obs-bucket&limit=2&replicationType=RATIS`
+   * Output Response:
+   *
+   * {
+   * "status": "OK",
+   * "path": "/volume1/obs-bucket",
+   * "replicatedDataSize": 62914560,
+   * "unReplicatedDataSize": 62914560,
+   * "lastKey": "/volume1/obs-bucket/key6",
+   * "keys": [
+   * {
+   * "key": "/volume1/obs-bucket/key1",
+   * "path": "volume1/obs-bucket/key1",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781418742,
+   * "modificationTime": 1715781419762,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2",
+   * "path": "volume1/obs-bucket/key1/key2",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781421716,
+   * "modificationTime": 1715781422723,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2/key3",
+   * "path": "volume1/obs-bucket/key1/key2/key3",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781424718,
+   * "modificationTime": 1715781425598,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key4",
+   * "path": "volume1/obs-bucket/key4",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime":

Re: [PR] HDDS-10634. Recon - listKeys API for listing keys with optional filters [ozone]

2024-05-23 Thread via GitHub


sumitagrawl commented on code in PR #6658:
URL: https://github.com/apache/ozone/pull/6658#discussion_r1611434486


##
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/OMDBInsightEndpoint.java:
##
@@ -674,6 +686,625 @@ public Response getDeletedDirectorySummary() {
 return Response.ok(dirSummary).build();
   }
 
+  /**
+   * This API will list out limited 'count' number of keys after applying 
below filters in API parameters:
+   * Default Values of API param filters:
+   *-- replicationType - empty string and filter will not be applied, so 
list out all keys irrespective of
+   *   replication type.
+   *-- creationTime - empty string and filter will not be applied, so list 
out keys irrespective of age.
+   *-- keySize - 0 bytes, which means all keys greater than zero bytes 
will be listed, effectively all.
+   *-- startPrefix - /, API assumes that startPrefix path always starts 
with /. E.g. /volume/bucket
+   *-- prevKey - ""
+   *-- limit - 1000
+   *
+   * @param replicationType Filter for RATIS or EC replication keys
+   * @param creationDate Filter for keys created after creationDate in 
"MM-dd- HH:mm:ss" string format.
+   * @param keySize Filter for Keys greater than keySize in bytes.
+   * @param startPrefix Filter for startPrefix path.
+   * @param prevKey rocksDB last key of page requested.
+   * @param limit Filter for limited count of keys.
+   *
+   * @return the list of keys in JSON structured format as per respective 
bucket layout.
+   *
+   * Now lets consider, we have following OBS, LEGACY and FSO bucket key/files 
namespace tree structure
+   *
+   * For OBS Bucket
+   *
+   * /volume1/obs-bucket/key1
+   * /volume1/obs-bucket/key1/key2
+   * /volume1/obs-bucket/key1/key2/key3
+   * /volume1/obs-bucket/key4
+   * /volume1/obs-bucket/key5
+   * /volume1/obs-bucket/key6
+   * For LEGACY Bucket
+   *
+   * /volume1/legacy-bucket/key
+   * /volume1/legacy-bucket/key1/key2
+   * /volume1/legacy-bucket/key1/key2/key3
+   * /volume1/legacy-bucket/key4
+   * /volume1/legacy-bucket/key5
+   * /volume1/legacy-bucket/key6
+   * For FSO Bucket
+   *
+   * /volume1/fso-bucket/dir1/dir2/dir3
+   * /volume1/fso-bucket/dir1/testfile
+   * /volume1/fso-bucket/dir1/file1
+   * /volume1/fso-bucket/dir1/dir2/testfile
+   * /volume1/fso-bucket/dir1/dir2/file1
+   * /volume1/fso-bucket/dir1/dir2/dir3/testfile
+   * /volume1/fso-bucket/dir1/dir2/dir3/file1
+   * Input Request for OBS bucket:
+   *
+   *
`api/v1/keys/listKeys?startPrefix=/volume1/obs-bucket&limit=2&replicationType=RATIS`
+   * Output Response:
+   *
+   * {
+   * "status": "OK",
+   * "path": "/volume1/obs-bucket",
+   * "replicatedDataSize": 62914560,
+   * "unReplicatedDataSize": 62914560,
+   * "lastKey": "/volume1/obs-bucket/key6",
+   * "keys": [
+   * {
+   * "key": "/volume1/obs-bucket/key1",
+   * "path": "volume1/obs-bucket/key1",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781418742,
+   * "modificationTime": 1715781419762,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2",
+   * "path": "volume1/obs-bucket/key1/key2",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781421716,
+   * "modificationTime": 1715781422723,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2/key3",
+   * "path": "volume1/obs-bucket/key1/key2/key3",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781424718,
+   * "modificationTime": 1715781425598,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key4",
+   * "path": "volume1/obs-bucket/key4",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime

Re: [PR] HDDS-10634. Recon - listKeys API for listing keys with optional filters [ozone]

2024-05-23 Thread via GitHub


sumitagrawl commented on code in PR #6658:
URL: https://github.com/apache/ozone/pull/6658#discussion_r1611285969


##
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/OMDBInsightEndpoint.java:
##
@@ -674,6 +685,720 @@ public Response getDeletedDirectorySummary() {
 return Response.ok(dirSummary).build();
   }
 
+  /**
+   * This API will list out limited 'count' number of keys after applying 
below filters in API parameters:
+   * Default Values of API param filters:
+   *-- replicationType - empty string and filter will not be applied, so 
list out all keys irrespective of
+   *   replication type.
+   *-- creationTime - empty string and filter will not be applied, so list 
out keys irrespective of age.
+   *-- keySize - 0 bytes, which means all keys greater than zero bytes 
will be listed, effectively all.
+   *-- startPrefix - /, API assumes that startPrefix path always starts 
with /. E.g. /volume/bucket
+   *-- prevKey - ""
+   *-- limit - 1000
+   *
+   * @param replicationType Filter for RATIS or EC replication keys
+   * @param creationDate Filter for keys created after creationDate in 
"MM-dd- HH:mm:ss" string format.
+   * @param keySize Filter for Keys greater than keySize in bytes.
+   * @param startPrefix Filter for startPrefix path.
+   * @param prevKey rocksDB last key of page requested.
+   * @param limit Filter for limited count of keys.
+   *
+   * @return the list of keys in JSON structured format as per respective 
bucket layout.
+   *
+   * Now lets consider, we have following OBS, LEGACY and FSO bucket key/files 
namespace tree structure
+   *
+   * For OBS Bucket
+   *
+   * /volume1/obs-bucket/key1
+   * /volume1/obs-bucket/key1/key2
+   * /volume1/obs-bucket/key1/key2/key3
+   * /volume1/obs-bucket/key4
+   * /volume1/obs-bucket/key5
+   * /volume1/obs-bucket/key6
+   * For LEGACY Bucket
+   *
+   * /volume1/legacy-bucket/key
+   * /volume1/legacy-bucket/key1/key2
+   * /volume1/legacy-bucket/key1/key2/key3
+   * /volume1/legacy-bucket/key4
+   * /volume1/legacy-bucket/key5
+   * /volume1/legacy-bucket/key6
+   * For FSO Bucket
+   *
+   * /volume1/fso-bucket/dir1/dir2/dir3
+   * /volume1/fso-bucket/dir1/testfile
+   * /volume1/fso-bucket/dir1/file1
+   * /volume1/fso-bucket/dir1/dir2/testfile
+   * /volume1/fso-bucket/dir1/dir2/file1
+   * /volume1/fso-bucket/dir1/dir2/dir3/testfile
+   * /volume1/fso-bucket/dir1/dir2/dir3/file1
+   * Input Request for OBS bucket:
+   *
+   *
`api/v1/keys/listKeys?startPrefix=/volume1/obs-bucket&limit=2&replicationType=RATIS`
+   * Output Response:
+   *
+   * {
+   * "status": "OK",
+   * "path": "/volume1/obs-bucket",
+   * "replicatedDataSize": 62914560,
+   * "unReplicatedDataSize": 62914560,
+   * "lastKey": "/volume1/obs-bucket/key6",
+   * "keys": [
+   * {
+   * "key": "/volume1/obs-bucket/key1",
+   * "path": "volume1/obs-bucket/key1",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781418742,
+   * "modificationTime": 1715781419762,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2",
+   * "path": "volume1/obs-bucket/key1/key2",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781421716,
+   * "modificationTime": 1715781422723,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key1/key2/key3",
+   * "path": "volume1/obs-bucket/key1/key2/key3",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime": 1715781424718,
+   * "modificationTime": 1715781425598,
+   * "isKey": true
+   * },
+   * {
+   * "key": "/volume1/obs-bucket/key4",
+   * "path": "volume1/obs-bucket/key4",
+   * "size": 10485760,
+   * "replicatedSize": 10485760,
+   * "replicationInfo": {
+   * "replicationFactor": "ONE",
+   * "requiredNodes": 1,
+   * "replicationType": "RATIS"
+   * },
+   * "creationTime

[jira] [Updated] (HDDS-10900) Add some health state metrics for EC container

2024-05-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDDS-10900:
--
Labels: pull-request-available  (was: )

> Add some health state metrics for EC container
> --
>
> Key: HDDS-10900
> URL: https://issues.apache.org/jira/browse/HDDS-10900
> Project: Apache Ozone
>  Issue Type: Improvement
>  Components: EC, SCM
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Major
>  Labels: pull-request-available
>
> Add some health state metrics for EC container.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[PR] HDDS-10900. Add some health state metrics for EC container [ozone]

2024-05-23 Thread via GitHub


YuanbenWang opened a new pull request, #6719:
URL: https://github.com/apache/ozone/pull/6719

   ## What changes were proposed in this pull request?
   
   This PR aims to add some health state metrics for EC container.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-10900
   
   ## How was this patch tested?
   
   Unit Test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



[jira] [Assigned] (HDDS-10900) Add some health state metrics for EC container

2024-05-23 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-10900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben reassigned HDDS-10900:
--

Assignee: WangYuanben

> Add some health state metrics for EC container
> --
>
> Key: HDDS-10900
> URL: https://issues.apache.org/jira/browse/HDDS-10900
> Project: Apache Ozone
>  Issue Type: Improvement
>  Components: EC, SCM
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Major
>
> Add some health state metrics for EC container.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10488. Datanode OOO due to run out of mmap handler [ozone]

2024-05-23 Thread via GitHub


ChenSammi commented on code in PR #6690:
URL: https://github.com/apache/ozone/pull/6690#discussion_r1611135403


##
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChunkBufferImplWithByteBuffer.java:
##
@@ -163,4 +175,12 @@ public String toString() {
 return getClass().getSimpleName() + ":limit=" + buffer.limit()
 + "@" + Integer.toHexString(hashCode());
   }
+
+  @Override
+  protected void finalize() throws Throwable {

Review Comment:
   A ChunkBuffer is converted as one ByteString(data can be copied, or zero 
copied, depending on unsafe conversion is enabled or not). Multiple ByteString 
data for a readChunk request could be concatenated as one whole ByteString or a 
list of ByteString, depending on request options.  The whole response with data 
is passed to the GPRC call which is async. I could be wrong, but I didn't find 
any GRPC notification or callback to tell a response has been sent out 
successfully so that the buffer associated with the response can be safely 
released, when last time I investigated if it's feasible to adopt the buffer 
pool in datanode for data read.  
   
   When the response is finally sent out, the buffer will be auto released by 
GC some time later. GC will not call ChunkBuffer#close(), but it will call 
finalize().  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10488. Datanode OOO due to run out of mmap handler [ozone]

2024-05-23 Thread via GitHub


ChenSammi commented on code in PR #6690:
URL: https://github.com/apache/ozone/pull/6690#discussion_r1611137340


##
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChunkBufferImplWithByteBuffer.java:
##
@@ -163,4 +175,12 @@ public String toString() {
 return getClass().getSimpleName() + ":limit=" + buffer.limit()
 + "@" + Integer.toHexString(hashCode());
   }
+
+  @Override
+  protected void finalize() throws Throwable {

Review Comment:
   I'm trying to use WeakReference solution. Will update the patch later. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org



Re: [PR] HDDS-10488. Datanode OOO due to run out of mmap handler [ozone]

2024-05-23 Thread via GitHub


ChenSammi commented on code in PR #6690:
URL: https://github.com/apache/ozone/pull/6690#discussion_r1611135403


##
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChunkBufferImplWithByteBuffer.java:
##
@@ -163,4 +175,12 @@ public String toString() {
 return getClass().getSimpleName() + ":limit=" + buffer.limit()
 + "@" + Integer.toHexString(hashCode());
   }
+
+  @Override
+  protected void finalize() throws Throwable {

Review Comment:
   A ChunkBuffer is converted as one ByteString(data can be copied, or zero 
copied, depending on unsafe conversion is enabled or not). Multiple ByteString 
data for a readChunk request could be concatenated as one whole ByteString or a 
list of ByteString, depending on request options.  The whole response with data 
is passed to the GPRC call which is async. I could be wrong, but I didn't find 
any GRPC notification or callback to tell a response has been sent out 
successfully so that the buffer associated with the response can be safely 
released, when last time I investigated if it's feasible to adopt the buffer 
pool in datanode for data read.  
   
   When the response is finally sent out, the buffer will be auto released by 
GC some time later. GC will no call ChunkBuffer#close(), but it will call 
finalize().  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org