[jira] [Created] (HDDS-2647) Ozone DataNode does not set raft.server.log.corruption.policy to the RaftServer implementation it uses
Istvan Fajth created HDDS-2647: -- Summary: Ozone DataNode does not set raft.server.log.corruption.policy to the RaftServer implementation it uses Key: HDDS-2647 URL: https://issues.apache.org/jira/browse/HDDS-2647 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone Datanode Reporter: Istvan Fajth In the XceiverServerRatis class which is used by the DataNode as well to create the RaftServer implementation that is used, there is a method called newRaftProperties() which is there to set up the RaftProperties object specified for the RaftServer it starts. This method is pretty hard to keep in sync with all the ratis properties, and due to an issue where I was turned to RATIS-677 which introduced a new configuration, I was not able to set this new property via the DataNode's ozone-site.xml, as it was not forwarded to the Ratis server. On the long run we would need a better implementation that does not need tuning and follow up for every new Ratis property, however at the moment as a wuick fix we can just provide the property. Depending on the implementor, if we go with the easy way, then please create a new JIRA for a better solution after finishing this one. Also if I am wrong, and Ratis properties can be defined for the DN properly elsewhere, please let me know. As OM is also using Ratis in HA configuration, this one should be checked there as well, however this one is not really important until RATIS-762 is fixed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] lokeshj1703 commented on a change in pull request #276: HDDS-2637. Handle LeaderNot ready exception in OzoneManager StateMachine and upgrade ratis to latest version.
lokeshj1703 commented on a change in pull request #276: HDDS-2637. Handle LeaderNot ready exception in OzoneManager StateMachine and upgrade ratis to latest version. URL: https://github.com/apache/hadoop-ozone/pull/276#discussion_r351872153 ## File path: hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/protocolPB/OzoneManagerProtocolClientSideTranslatorPB.java ## @@ -370,13 +396,16 @@ private OMResponse submitRequest(OMRequest omRequest) return omResponse; } catch (ServiceException e) { -// throw ProtobufHelper.getRemoteException(e); NotLeaderException notLeaderException = getNotLeaderException(e); if (notLeaderException == null) { throw ProtobufHelper.getRemoteException(e); Review comment: If notLeaderException is null, we will always throw exception without checking the other exception types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #282: HDDS-2646. Start acceptance tests only if at least one THREE pipeline is available
elek commented on issue #282: HDDS-2646. Start acceptance tests only if at least one THREE pipeline is available URL: https://github.com/apache/hadoop-ozone/pull/282#issuecomment-559548313 @ChenSammi You are more experienced with this area. Can you please review this approach / patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek opened a new pull request #282: Hdds 2646
elek opened a new pull request #282: Hdds 2646 URL: https://github.com/apache/hadoop-ozone/pull/282 ## What changes were proposed in this pull request? After [HDDS-2034](https://issues.apache.org/jira/browse/HDDS-2034) (or even before?) pipeline creation (or the status transition from ALLOCATE to OPEN) requires at least one pipeline report from all of the datanodes. Which means that the cluster might not be usable even if it's out from the safe mode AND there are at least three datanodes. It makes all the acceptance tests unstable. For example in [this](https://github.com/apache/hadoop-ozone/pull/263/checks?check_run_id=324489319) run. ``` scm_1 | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 548f146f-2166-440a-b9f1-83086591ae26 scm_1 | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c scm_1 | 2019-11-28 11:22:54,404 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 47dbb8e4-bbde-4164-a798-e47e8c696fb5 scm_1 | 2019-11-28 11:22:54,405 INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 8dc4aeb6-5ae2-46a0-948d-287c97dd81fb, Nodes: 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}47dbb8e4-bbde-4164-a798-e47e8c696fb5{ip: 172.24.0.2, host: ozoneperf_datanode_2.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED] scm_1 | 2019-11-28 11:22:56,975 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:22:58,018 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:23:01,871 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:23:02,817 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:23:02,847 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} ``` As you can see the pipeline is created but the the cluster is not usable as it's not yet reporter back by datanode_2: ``` scm_1 | 2019-11-28 11:23:13,879 WARN block.BlockManagerImpl: Pipeline creation failed for type:RATIS factor:THREE. Retrying get pipelines c all once. scm_1 | org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot create pipeline of factor 3 using 0 nodes. ``` The quick fix is to configure all the compose clusters to wait until (at least) one pipeline is available. This can be done by adjusting the number of the required datanodes: ``` // We only care about THREE replica pipeline int minHealthyPipelines = minDatanodes / HddsProtos.ReplicationFactor.THREE_VALUE; ``` ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2646 ## How was this patch tested? If something is wrong, acceptance tests are failing. We need green run from the CI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe,
[jira] [Updated] (HDDS-2646) Start acceptance tests only if at least one THREE pipeline is available
[ https://issues.apache.org/jira/browse/HDDS-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek updated HDDS-2646: -- Priority: Blocker (was: Major) > Start acceptance tests only if at least one THREE pipeline is available > --- > > Key: HDDS-2646 > URL: https://issues.apache.org/jira/browse/HDDS-2646 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Marton Elek >Priority: Blocker > Attachments: docker-ozoneperf-ozoneperf-basic-scm.log > > > After HDDS-2034 (or even before?) pipeline creation (or the status transition > from ALLOCATE to OPEN) requires at least one pipeline report from all of the > datanodes. Which means that the cluster might not be usable even if it's out > from the safe mode AND there are at least three datanodes. > It makes all the acceptance tests unstable. > For example in > [this|https://github.com/apache/hadoop-ozone/pull/263/checks?check_run_id=324489319] > run. > {code:java} > scm_1 | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider: > Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command > to datanode 548f146f-2166-440a-b9f1-83086591ae26 > scm_1 | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider: > Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command > to datanode dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c > scm_1 | 2019-11-28 11:22:54,404 INFO pipeline.RatisPipelineProvider: > Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command > to datanode 47dbb8e4-bbde-4164-a798-e47e8c696fb5 > scm_1 | 2019-11-28 11:22:54,405 INFO pipeline.PipelineStateManager: > Created pipeline Pipeline[ Id: 8dc4aeb6-5ae2-46a0-948d-287c97dd81fb, Nodes: > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null}dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: > ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null}47dbb8e4-bbde-4164-a798-e47e8c696fb5{ip: 172.24.0.2, host: > ozoneperf_datanode_2.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED] > scm_1 | 2019-11-28 11:22:56,975 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:22:58,018 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: > ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:23:01,871 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:23:02,817 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:23:02,847 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: > ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} {code} > As you can see the pipeline is created but the the cluster is not usable as > it's not yet reporter back by datanode_2: > {code:java} > scm_1 | 2019-11-28 11:23:13,879 WARN block.BlockManagerImpl: Pipeline > creation failed for type:RATIS factor:THREE. Retrying get pipelines c > all once. > scm_1 | > org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot > create pipeline of factor 3 using 0 nodes.{code} > The quick fix is to configure all the compose clusters to wait until one > pipeline is available. This can be done by adjusting the number of the > required datanodes: > {code:java} > // We only care about THREE replica pipeline > int minHealthyPipelines = minDatanodes / > HddsProtos.ReplicationFactor.THREE_VALUE; {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To
[jira] [Updated] (HDDS-2646) Start acceptance tests only if at least one THREE pipeline is available
[ https://issues.apache.org/jira/browse/HDDS-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek updated HDDS-2646: -- Attachment: docker-ozoneperf-ozoneperf-basic-scm.log > Start acceptance tests only if at least one THREE pipeline is available > --- > > Key: HDDS-2646 > URL: https://issues.apache.org/jira/browse/HDDS-2646 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Blocker > Attachments: docker-ozoneperf-ozoneperf-basic-scm.log > > > After HDDS-2034 (or even before?) pipeline creation (or the status transition > from ALLOCATE to OPEN) requires at least one pipeline report from all of the > datanodes. Which means that the cluster might not be usable even if it's out > from the safe mode AND there are at least three datanodes. > It makes all the acceptance tests unstable. > For example in > [this|https://github.com/apache/hadoop-ozone/pull/263/checks?check_run_id=324489319] > run. > {code:java} > scm_1 | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider: > Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command > to datanode 548f146f-2166-440a-b9f1-83086591ae26 > scm_1 | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider: > Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command > to datanode dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c > scm_1 | 2019-11-28 11:22:54,404 INFO pipeline.RatisPipelineProvider: > Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command > to datanode 47dbb8e4-bbde-4164-a798-e47e8c696fb5 > scm_1 | 2019-11-28 11:22:54,405 INFO pipeline.PipelineStateManager: > Created pipeline Pipeline[ Id: 8dc4aeb6-5ae2-46a0-948d-287c97dd81fb, Nodes: > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null}dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: > ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null}47dbb8e4-bbde-4164-a798-e47e8c696fb5{ip: 172.24.0.2, host: > ozoneperf_datanode_2.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED] > scm_1 | 2019-11-28 11:22:56,975 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:22:58,018 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: > ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:23:01,871 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:23:02,817 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:23:02,847 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: > ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} {code} > As you can see the pipeline is created but the the cluster is not usable as > it's not yet reporter back by datanode_2: > {code:java} > scm_1 | 2019-11-28 11:23:13,879 WARN block.BlockManagerImpl: Pipeline > creation failed for type:RATIS factor:THREE. Retrying get pipelines c > all once. > scm_1 | > org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot > create pipeline of factor 3 using 0 nodes.{code} > The quick fix is to configure all the compose clusters to wait until one > pipeline is available. This can be done by adjusting the number of the > required datanodes: > {code:java} > // We only care about THREE replica pipeline > int minHealthyPipelines = minDatanodes / > HddsProtos.ReplicationFactor.THREE_VALUE; {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HDDS-2646) Start acceptance tests only if at least one THREE pipeline is available
[ https://issues.apache.org/jira/browse/HDDS-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Elek updated HDDS-2646: -- Description: After HDDS-2034 (or even before?) pipeline creation (or the status transition from ALLOCATE to OPEN) requires at least one pipeline report from all of the datanodes. Which means that the cluster might not be usable even if it's out from the safe mode AND there are at least three datanodes. It makes all the acceptance tests unstable. For example in [this|https://github.com/apache/hadoop-ozone/pull/263/checks?check_run_id=324489319] run. {code:java} scm_1 | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 548f146f-2166-440a-b9f1-83086591ae26 scm_1 | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c scm_1 | 2019-11-28 11:22:54,404 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 47dbb8e4-bbde-4164-a798-e47e8c696fb5 scm_1 | 2019-11-28 11:22:54,405 INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 8dc4aeb6-5ae2-46a0-948d-287c97dd81fb, Nodes: 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}47dbb8e4-bbde-4164-a798-e47e8c696fb5{ip: 172.24.0.2, host: ozoneperf_datanode_2.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED] scm_1 | 2019-11-28 11:22:56,975 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:22:58,018 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:23:01,871 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:23:02,817 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:23:02,847 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} {code} As you can see the pipeline is created but the the cluster is not usable as it's not yet reporter back by datanode_2: {code:java} scm_1 | 2019-11-28 11:23:13,879 WARN block.BlockManagerImpl: Pipeline creation failed for type:RATIS factor:THREE. Retrying get pipelines c all once. scm_1 | org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot create pipeline of factor 3 using 0 nodes.{code} The quick fix is to configure all the compose clusters to wait until one pipeline is available. This can be done by adjusting the number of the required datanodes: {code:java} // We only care about THREE replica pipeline int minHealthyPipelines = minDatanodes / HddsProtos.ReplicationFactor.THREE_VALUE; {code} was: After HDDS-2034 (or even before?) pipeline creation (or the status transition from ALLOCATE to OPEN) requires at least one pipeline report from all of the datanodes. Which means that the cluster might not be usable even if it's out from the safe mode AND there are at least three datanodes. It makes all the acceptance tests unstable. For example in [this|https://github.com/apache/hadoop-ozone/pull/263/checks?check_run_id=324489319] run. {code:java} scm_1 | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 548f146f-2166-440a-b9f1-83086591ae26 scm_1 | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider: Send
[jira] [Created] (HDDS-2646) Start acceptance tests only if at least one THREE pipeline is available
Marton Elek created HDDS-2646: - Summary: Start acceptance tests only if at least one THREE pipeline is available Key: HDDS-2646 URL: https://issues.apache.org/jira/browse/HDDS-2646 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Marton Elek After HDDS-2034 (or even before?) pipeline creation (or the status transition from ALLOCATE to OPEN) requires at least one pipeline report from all of the datanodes. Which means that the cluster might not be usable even if it's out from the safe mode AND there are at least three datanodes. It makes all the acceptance tests unstable. For example in [this|https://github.com/apache/hadoop-ozone/pull/263/checks?check_run_id=324489319] run. {code:java} scm_1 | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 548f146f-2166-440a-b9f1-83086591ae26 scm_1 | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c scm_1 | 2019-11-28 11:22:54,404 INFO pipeline.RatisPipelineProvider: Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command to datanode 47dbb8e4-bbde-4164-a798-e47e8c696fb5 scm_1 | 2019-11-28 11:22:54,405 INFO pipeline.PipelineStateManager: Created pipeline Pipeline[ Id: 8dc4aeb6-5ae2-46a0-948d-287c97dd81fb, Nodes: 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}47dbb8e4-bbde-4164-a798-e47e8c696fb5{ip: 172.24.0.2, host: ozoneperf_datanode_2.ozoneperf_default, networkLocation: /default-rack, certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED] scm_1 | 2019-11-28 11:22:56,975 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:22:58,018 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:23:01,871 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:23:02,817 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} scm_1 | 2019-11-28 11:23:02,847 INFO pipeline.PipelineReportHandler: Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, certSerialId: null} {code} As you can see the pipeline is created but the -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2645) Refactor MiniOzoneChaosCluster to a different package to add filesystem tests
Mukul Kumar Singh created HDDS-2645: --- Summary: Refactor MiniOzoneChaosCluster to a different package to add filesystem tests Key: HDDS-2645 URL: https://issues.apache.org/jira/browse/HDDS-2645 Project: Hadoop Distributed Data Store Issue Type: Bug Components: test Reporter: Mukul Kumar Singh Assignee: Mukul Kumar Singh Refactor MiniOzoneChaosCluster to fault-injection-tests. Also add a dependency to hadoop-ozone-filesystem to add filesystem tests later. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2644) TestTableCacheImpl#testPartialTableCacheWithOverrideAndDelete fails intermittently
Lokesh Jain created HDDS-2644: - Summary: TestTableCacheImpl#testPartialTableCacheWithOverrideAndDelete fails intermittently Key: HDDS-2644 URL: https://issues.apache.org/jira/browse/HDDS-2644 Project: Hadoop Distributed Data Store Issue Type: Test Reporter: Lokesh Jain {code:java} [ERROR] Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.87 s <<< FAILURE! - in org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl [ERROR] testPartialTableCacheWithOverrideAndDelete[0](org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl) Time elapsed: 0.044 s <<< FAILURE! java.lang.AssertionError: expected:<2> but was:<6> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl.testPartialTableCacheWithOverrideAndDelete(TestTableCacheImpl.java:308) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.junit.runners.Suite.runChild(Suite.java:127) at org.junit.runners.Suite.runChild(Suite.java:26) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDDS-2640) Add leaderID information in pipeline list subcommand
[ https://issues.apache.org/jira/browse/HDDS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nilotpal Nandi reopened HDDS-2640: -- > Add leaderID information in pipeline list subcommand > > > Key: HDDS-2640 > URL: https://issues.apache.org/jira/browse/HDDS-2640 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM Client >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Need to add leaderID information in listPipeline subcommand. > i.e, > ozone scmcli pipeline list > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] nilotpalnandi opened a new pull request #281: HDDS-2640 Add leaderID information in pipeline list subcommand
nilotpalnandi opened a new pull request #281: HDDS-2640 Add leaderID information in pipeline list subcommand URL: https://github.com/apache/hadoop-ozone/pull/281 ## What changes were proposed in this pull request? scmcli pipeline list command does not display the leaderID information for each pipeline. This change will include the leaderID information along with other details. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2640 ## How was this patch tested? Applied the patch and rebuilt ozone and then tested it by creating docker cluster using docker-compose This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand
adoroszlai commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand URL: https://github.com/apache/hadoop-ozone/pull/279#issuecomment-559508963 > We need to remove this dependency OR always fix (or rerun) the unit test. +1 for removing the dependency: it would also let acceptance tests start ~20 minutes earlier, reducing overall feedback time by 25% (80 -> 60 minutes). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] lokeshj1703 commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand
lokeshj1703 commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand URL: https://github.com/apache/hadoop-ozone/pull/279#issuecomment-559508597 @elek Yeah I agree. I think removing the dependency between acceptance test and unit would be great because there are a lot of tests which fail intermittently. The post commit build failed with failure in TestTableCacheImpl. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek edited a comment on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand
elek edited a comment on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand URL: https://github.com/apache/hadoop-ozone/pull/279#issuecomment-559504005 No problem, (and don't need to revert). I just realized how small this patch after I wrote the comment, so I understand that it's very low chance to introduce any problem (unless one acceptance test checks the output of the pipeline command). It was more like a FYI: Master become unstable again and having reports for all the checks would help to understand the root of the problems. The problem is that we have a strong dependency between acceptance and unit and if unit is failed (even if because and intermittent error) the acceptance tests are not executed. We need to remove this dependency OR always fix (or rerun) the unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand
elek commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand URL: https://github.com/apache/hadoop-ozone/pull/279#issuecomment-559504005 No problem, (and don't need to revert). I just realized how small this patch after I wrote the comment, so I understand that it's very high chance that it's not problematic (unless one acceptance test checks the output). Master become unstable again and having reports for all the checks would help to understand the root of the problems. The problem is that we have a strong dependency between acceptance and unit and if unit is failed (even if because and intermittent error) the acceptance tests are not executed. We need to remove this dependency OR always fix (or rerun) the unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] lokeshj1703 merged pull request #280: Revert "HDDS-2640 Add leaderID information in pipeline list subcommand"
lokeshj1703 merged pull request #280: Revert "HDDS-2640 Add leaderID information in pipeline list subcommand" URL: https://github.com/apache/hadoop-ozone/pull/280 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] lokeshj1703 opened a new pull request #280: Revert "HDDS-2640 Add leaderID information in pipeline list subcommand"
lokeshj1703 opened a new pull request #280: Revert "HDDS-2640 Add leaderID information in pipeline list subcommand" URL: https://github.com/apache/hadoop-ozone/pull/280 Reverts apache/hadoop-ozone#279 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2642) Expose decommission / maintenance metrics via JMX
Stephen O'Donnell created HDDS-2642: --- Summary: Expose decommission / maintenance metrics via JMX Key: HDDS-2642 URL: https://issues.apache.org/jira/browse/HDDS-2642 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM Affects Versions: 0.5.0 Reporter: Stephen O'Donnell As nodes transition through the decommission and maintenance workflow, we should expose the hosts going through admin via JMX, along with possibly: 1. The stage of the process (close pipelines, replicate containers etc) 2. The number of sufficiently replicated, under replicated and unhealthy containers -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2641) Allow SCM webUI to show decommision and maintenance nodes
Stephen O'Donnell created HDDS-2641: --- Summary: Allow SCM webUI to show decommision and maintenance nodes Key: HDDS-2641 URL: https://issues.apache.org/jira/browse/HDDS-2641 Project: Hadoop Distributed Data Store Issue Type: Sub-task Components: SCM Affects Versions: 0.5.0 Reporter: Stephen O'Donnell The SCM WebUI should show the current set of decommission and maintenance nodes, possibly including the number of containers each node is waiting to have replicated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand
elek commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand URL: https://github.com/apache/hadoop-ozone/pull/279#issuecomment-559489646 > The test failure does not seem related. Please don't commit patches with failing unit test, even if they are unrelated. The acceptance tests are not executed for this patch because the unit tests are failed. If you think it's not related, please create a jira, copy the failure and the logs and `@Ignore` the test, but get a green build. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] lokeshj1703 commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand
lokeshj1703 commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand URL: https://github.com/apache/hadoop-ozone/pull/279#issuecomment-559487731 The test failure does not seem related. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] lokeshj1703 commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand
lokeshj1703 commented on issue #279: HDDS-2640 Add leaderID information in pipeline list subcommand URL: https://github.com/apache/hadoop-ozone/pull/279#issuecomment-559488008 @nilotpalnandi Thanks for the contribution! I have committed the PR to master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] lokeshj1703 merged pull request #279: HDDS-2640 Add leaderID information in pipeline list subcommand
lokeshj1703 merged pull request #279: HDDS-2640 Add leaderID information in pipeline list subcommand URL: https://github.com/apache/hadoop-ozone/pull/279 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai commented on issue #238: HDDS-2588. Consolidate compose environments
adoroszlai commented on issue #238: HDDS-2588. Consolidate compose environments URL: https://github.com/apache/hadoop-ozone/pull/238#issuecomment-559477880 Thanks for the feedback @elek. > Can you please update the README.txt Sure, will do, but didn't want to write doc until the code is OK-ed. ;) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #238: HDDS-2588. Consolidate compose environments
elek commented on issue #238: HDDS-2588. Consolidate compose environments URL: https://github.com/apache/hadoop-ozone/pull/238#issuecomment-559472233 > I think (1) and (2) are addressed by the followup commit, which extracts monitoring and profiling into separate configs Thanks the update @adoroszlai This approach is very smart, but I have some fear how easy is to understand it. (One additional function of the compose folders to provide *simple* examples to use ozone.) But let's try out this approach. I am fine with it. Can you please update the README.txt inside `compose/ozone` (currently it's the original ozoneperf readme, It can be simplified but we need to add information about the `COMPOSE_FILE=...` trick)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2640) Add leaderID information in pipeline list subcommand
[ https://issues.apache.org/jira/browse/HDDS-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2640: - Labels: pull-request-available (was: ) > Add leaderID information in pipeline list subcommand > > > Key: HDDS-2640 > URL: https://issues.apache.org/jira/browse/HDDS-2640 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM Client >Reporter: Nilotpal Nandi >Assignee: Nilotpal Nandi >Priority: Major > Labels: pull-request-available > > Need to add leaderID information in listPipeline subcommand. > i.e, > ozone scmcli pipeline list > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] nilotpalnandi opened a new pull request #279: HDDS-2640 Add leaderID information in pipeline list subcommand
nilotpalnandi opened a new pull request #279: HDDS-2640 Add leaderID information in pipeline list subcommand URL: https://github.com/apache/hadoop-ozone/pull/279 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## What is the link to the Apache JIRA (Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HDDS-. Fix a typo in YYY.) Please replace this section with the link to the Apache JIRA) ## How was this patch tested? (Please explain how this patch was tested. Ex: unit tests, manual tests) (If this patch involves UI changes, please attach a screen-shot; otherwise, remove this) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2640) Add leaderID information in pipeline list subcommand
Nilotpal Nandi created HDDS-2640: Summary: Add leaderID information in pipeline list subcommand Key: HDDS-2640 URL: https://issues.apache.org/jira/browse/HDDS-2640 Project: Hadoop Distributed Data Store Issue Type: Bug Components: SCM Client Reporter: Nilotpal Nandi Assignee: Nilotpal Nandi Need to add leaderID information in listPipeline subcommand. i.e, ozone scmcli pipeline list -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek opened a new pull request #278: HDDS-2639. TestTableCacheImpl is flaky
elek opened a new pull request #278: HDDS-2639. TestTableCacheImpl is flaky URL: https://github.com/apache/hadoop-ozone/pull/278 ## What changes were proposed in this pull request? Run(master): https://github.com/apache/hadoop-ozone/runs/324342299 ``` --- Test set: org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl --- Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.955 s <<< FAILURE! - in org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl testPartialTableCacheWithOverrideAndDelete[0](org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl) Time elapsed: 0.039 s <<< FAILURE! java.lang.AssertionError: expected:<2> but was:<6> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl.testPartialTableCacheWithOverrideAndDelete(TestTableCacheImpl.java:308) ``` ### How to reproduce it locally? Replace the last `tableCache.cleanup` call of `testPartialTableCacheWithOverrideAndDelete` to `System.out.println(tableCache.size())`. You will see that the cache size is `2` even before the cleanup therefore the next `GeneriTestUtils.waitFor` is useless (it doesn't guarantee that the cleanup is finished). ### Fix I propose to call the cleanup sync (instead of async) with using `TableCacheImpl` reference instead of the interface. It simplifies the test but still validates the behavior. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-2639 ## How was this patch tested? Problem an be reproduced locally as defined above. Fix can be tested with executing the `TestTableCacheImpl` unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2639) TestTableCacheImpl is flaky
Marton Elek created HDDS-2639: - Summary: TestTableCacheImpl is flaky Key: HDDS-2639 URL: https://issues.apache.org/jira/browse/HDDS-2639 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Marton Elek Run(master): [https://github.com/apache/hadoop-ozone/runs/324342299] {code:java} --- Test set: org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl --- Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.955 s <<< FAILURE! - in org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl testPartialTableCacheWithOverrideAndDelete[0](org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl) Time elapsed: 0.039 s <<< FAILURE! java.lang.AssertionError: expected:<2> but was:<6> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdds.utils.db.cache.TestTableCacheImpl.testPartialTableCacheWithOverrideAndDelete(TestTableCacheImpl.java:308) {code} *How to reproduce it locally?* Replace the last tableCache.evict call of testPartialTableCacheWithOverrideAndDelete to System.out.println(tableCache.size()). You will see that the cache size is 2 even before the cleanup therefore the next GeneriTestUtils.waitFor is useless (it doesn't guarantee that the cleanup is finished). *Fix:* I propose to call the cleanup sync with using the Impl class instead of the interface. It simplifies the test but still validates the behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek closed pull request #249: Reopen HDDS-2034 Async RATIS pipeline creation and destroy through heartbeat commands
elek closed pull request #249: Reopen HDDS-2034 Async RATIS pipeline creation and destroy through heartbeat commands URL: https://github.com/apache/hadoop-ozone/pull/249 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2628) Make AuditMessage parameters strongly typed
[ https://issues.apache.org/jira/browse/HDDS-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Doroszlai updated HDDS-2628: --- Status: Patch Available (was: In Progress) > Make AuditMessage parameters strongly typed > --- > > Key: HDDS-2628 > URL: https://issues.apache.org/jira/browse/HDDS-2628 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Improve type safety in {{AuditMessage$Builder}} for methods {{forOperation}} > and {{withResult}} by using existing {{interface AuditAction}} and {{enum > AuditEventStatus}} respectively instead of Strings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2628) Make AuditMessage parameters strongly typed
[ https://issues.apache.org/jira/browse/HDDS-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDDS-2628: - Labels: pull-request-available (was: ) > Make AuditMessage parameters strongly typed > --- > > Key: HDDS-2628 > URL: https://issues.apache.org/jira/browse/HDDS-2628 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > > Improve type safety in {{AuditMessage$Builder}} for methods {{forOperation}} > and {{withResult}} by using existing {{interface AuditAction}} and {{enum > AuditEventStatus}} respectively instead of Strings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] adoroszlai opened a new pull request #277: HDDS-2628. Make AuditMessage parameters strongly typed
adoroszlai opened a new pull request #277: HDDS-2628. Make AuditMessage parameters strongly typed URL: https://github.com/apache/hadoop-ozone/pull/277 ## What changes were proposed in this pull request? 1. Improve type safety in `AuditMessage$Builder` for methods `forOperation` and `withResult` by using existing `interface AuditAction` and `enum AuditEventStatus` respectively instead of Strings. 2. Use existing `Server.getRemoteAddress()` instead of `Server.getRemoteIp().getHostAddress()` with null check 3. Define and use `getRemoteUserName()` along the same lines https://issues.apache.org/jira/browse/HDDS-2628 ## How was this patch tested? Created keys using Freon, verified audit log entries. ``` 2019-11-27 20:48:16,206 | INFO | SCMAudit | user=hadoop | ip=172.23.0.3 | op=ALLOCATE_BLOCK {owner=88982149-2c09-4cd7-8e38-fba8f23cff5e, size=268435456, type=RATIS, factor=ONE} | ret=SUCCESS | 2019-11-27 20:48:45,879 | INFO | SCMAudit | user=hadoop | ip=172.23.0.4 | op=SEND_HEARTBEAT {datanodeUUID=4e2ee488-6dc6-45f8-9f02-02f1a5cff554, command=[]} | ret=SUCCESS | ``` ``` 2019-11-27 20:48:16,208 | INFO | OMAudit | user=hadoop | ip=172.23.0.2 | op=ALLOCATE_KEY {volume=vol1, bucket=bucket1, key=OkoBsDwxuj/9, dataSize=10240, replicationType=RATIS, replicationFactor=ONE, keyLocationInfo=[blockID { containerBlockID { containerID: 1 localID: 103211840058556425 ... ``` https://github.com/adoroszlai/hadoop-ozone/runs/323724573 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on a change in pull request #262: HDDS-2459. Change the ReplicationManager to consider decommission and maintenance states
elek commented on a change in pull request #262: HDDS-2459. Change the ReplicationManager to consider decommission and maintenance states URL: https://github.com/apache/hadoop-ozone/pull/262#discussion_r351642030 ## File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java ## @@ -97,6 +98,11 @@ */ private final LockManager lockManager; + /** + * Used to lookup the health of a nodes or the nodes operational state. + */ + private final NodeManager nodeManager; Review comment: As far as I understood the proposal is to update the state of the containers by an other components based on the node state and use only the container state here (instead of checking the state by the node manager). I discussed it with @anuengineer. Let's go forward with this approach and later we can improve this part. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] elek commented on issue #262: HDDS-2459. Change the ReplicationManager to consider decommission and maintenance states
elek commented on issue #262: HDDS-2459. Change the ReplicationManager to consider decommission and maintenance states URL: https://github.com/apache/hadoop-ozone/pull/262#issuecomment-559389693 > I will change this to @ignore however I have not been able to find the cause of the problem Sure, just link the failing github actions unit test + download the logs and upload to the jira (if meaningful, in case of timeout can be empty). I am not interested about the real root cause, but we need a definition of the problem including assertion errors, exceptions and log output to check it later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] smengcl commented on a change in pull request #137: HDDS-2455. Implement MiniOzoneHAClusterImpl#getOMLeader
smengcl commented on a change in pull request #137: HDDS-2455. Implement MiniOzoneHAClusterImpl#getOMLeader URL: https://github.com/apache/hadoop-ozone/pull/137#discussion_r351641182 ## File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestMiniOzoneHACluster.java ## @@ -0,0 +1,108 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone; + +import org.apache.hadoop.hdds.conf.OzoneConfiguration; +import org.apache.hadoop.ozone.client.ObjectStore; +import org.apache.hadoop.ozone.client.OzoneClientFactory; +import org.apache.hadoop.ozone.om.OzoneManager; +import org.apache.hadoop.test.GenericTestUtils; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.rules.Timeout; + +import java.io.IOException; +import java.util.UUID; +import java.util.concurrent.TimeoutException; + +import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_ACL_ENABLED; +import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_ADMINISTRATORS_WILDCARD; +import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_OPEN_KEY_EXPIRE_THRESHOLD_SECONDS; + +/** + * This class tests MiniOzoneHAClusterImpl. + */ +public class TestMiniOzoneHACluster { + + private MiniOzoneHAClusterImpl cluster = null; + private ObjectStore objectStore; + private OzoneConfiguration conf; + private String clusterId; + private String scmId; + private String omServiceId; + private int numOfOMs = 3; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + @Rule + public Timeout timeout = new Timeout(300_000); + + /** + * Create a MiniOzoneHAClusterImpl for testing. + * + * @throws IOException + */ + @Before + public void init() throws Exception { +conf = new OzoneConfiguration(); +clusterId = UUID.randomUUID().toString(); +scmId = UUID.randomUUID().toString(); +omServiceId = "omServiceId1"; +conf.setBoolean(OZONE_ACL_ENABLED, true); +conf.set(OzoneConfigKeys.OZONE_ADMINISTRATORS, +OZONE_ADMINISTRATORS_WILDCARD); +conf.setInt(OZONE_OPEN_KEY_EXPIRE_THRESHOLD_SECONDS, 2); +cluster = (MiniOzoneHAClusterImpl) MiniOzoneCluster.newHABuilder(conf) +.setClusterId(clusterId) +.setScmId(scmId) +.setOMServiceId(omServiceId) +.setNumOfOzoneManagers(numOfOMs) +.build(); +cluster.waitForClusterToBeReady(); +objectStore = OzoneClientFactory.getRpcClient(omServiceId, conf) +.getObjectStore(); + } + + /** + * Shutdown MiniOzoneHAClusterImpl. + */ + @After + public void shutdown() { +if (cluster != null) { + cluster.shutdown(); +} + } + + @Test + public void testGetOMLeader() throws InterruptedException, TimeoutException { +// Wait for OM leader election to finish +GenericTestUtils.waitFor(() -> cluster.getOMLeader() != null, +100, 3); Review comment: Thanks @hanishakoneru for the comment. Note that assigning to `ozoneManager` which is outside the lambda expression requires it to be atomic. Just pushed a commit. Please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org
[GitHub] [hadoop-ozone] smengcl commented on a change in pull request #137: HDDS-2455. Implement MiniOzoneHAClusterImpl#getOMLeader
smengcl commented on a change in pull request #137: HDDS-2455. Implement MiniOzoneHAClusterImpl#getOMLeader URL: https://github.com/apache/hadoop-ozone/pull/137#discussion_r351641182 ## File path: hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestMiniOzoneHACluster.java ## @@ -0,0 +1,108 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.ozone; + +import org.apache.hadoop.hdds.conf.OzoneConfiguration; +import org.apache.hadoop.ozone.client.ObjectStore; +import org.apache.hadoop.ozone.client.OzoneClientFactory; +import org.apache.hadoop.ozone.om.OzoneManager; +import org.apache.hadoop.test.GenericTestUtils; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; +import org.junit.rules.Timeout; + +import java.io.IOException; +import java.util.UUID; +import java.util.concurrent.TimeoutException; + +import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_ACL_ENABLED; +import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_ADMINISTRATORS_WILDCARD; +import static org.apache.hadoop.ozone.OzoneConfigKeys.OZONE_OPEN_KEY_EXPIRE_THRESHOLD_SECONDS; + +/** + * This class tests MiniOzoneHAClusterImpl. + */ +public class TestMiniOzoneHACluster { + + private MiniOzoneHAClusterImpl cluster = null; + private ObjectStore objectStore; + private OzoneConfiguration conf; + private String clusterId; + private String scmId; + private String omServiceId; + private int numOfOMs = 3; + + @Rule + public ExpectedException exception = ExpectedException.none(); + + @Rule + public Timeout timeout = new Timeout(300_000); + + /** + * Create a MiniOzoneHAClusterImpl for testing. + * + * @throws IOException + */ + @Before + public void init() throws Exception { +conf = new OzoneConfiguration(); +clusterId = UUID.randomUUID().toString(); +scmId = UUID.randomUUID().toString(); +omServiceId = "omServiceId1"; +conf.setBoolean(OZONE_ACL_ENABLED, true); +conf.set(OzoneConfigKeys.OZONE_ADMINISTRATORS, +OZONE_ADMINISTRATORS_WILDCARD); +conf.setInt(OZONE_OPEN_KEY_EXPIRE_THRESHOLD_SECONDS, 2); +cluster = (MiniOzoneHAClusterImpl) MiniOzoneCluster.newHABuilder(conf) +.setClusterId(clusterId) +.setScmId(scmId) +.setOMServiceId(omServiceId) +.setNumOfOzoneManagers(numOfOMs) +.build(); +cluster.waitForClusterToBeReady(); +objectStore = OzoneClientFactory.getRpcClient(omServiceId, conf) +.getObjectStore(); + } + + /** + * Shutdown MiniOzoneHAClusterImpl. + */ + @After + public void shutdown() { +if (cluster != null) { + cluster.shutdown(); +} + } + + @Test + public void testGetOMLeader() throws InterruptedException, TimeoutException { +// Wait for OM leader election to finish +GenericTestUtils.waitFor(() -> cluster.getOMLeader() != null, +100, 3); Review comment: Thanks @hanishakoneru for the comment. Note that assigning to `ozoneManager` outside the lambda expression requires it to be atomic. Just pushed a commit. Please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org