[ https://issues.apache.org/jira/browse/HDDS-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anu Engineer resolved HDDS-2646. -------------------------------- Fix Version/s: 0.5.0 Resolution: Fixed Committed to master. Thanks for the contribution. > Start acceptance tests only if at least one THREE pipeline is available > ----------------------------------------------------------------------- > > Key: HDDS-2646 > URL: https://issues.apache.org/jira/browse/HDDS-2646 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: Marton Elek > Assignee: Marton Elek > Priority: Blocker > Labels: pull-request-available > Fix For: 0.5.0 > > Attachments: docker-ozoneperf-ozoneperf-basic-scm.log > > Time Spent: 10m > Remaining Estimate: 0h > > After HDDS-2034 (or even before?) pipeline creation (or the status transition > from ALLOCATE to OPEN) requires at least one pipeline report from all of the > datanodes. Which means that the cluster might not be usable even if it's out > from the safe mode AND there are at least three datanodes. > It makes all the acceptance tests unstable. > For example in > [this|https://github.com/apache/hadoop-ozone/pull/263/checks?check_run_id=324489319] > run. > {code:java} > scm_1 | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider: > Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command > to datanode 548f146f-2166-440a-b9f1-83086591ae26 > scm_1 | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider: > Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command > to datanode dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c > scm_1 | 2019-11-28 11:22:54,404 INFO pipeline.RatisPipelineProvider: > Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command > to datanode 47dbb8e4-bbde-4164-a798-e47e8c696fb5 > scm_1 | 2019-11-28 11:22:54,405 INFO pipeline.PipelineStateManager: > Created pipeline Pipeline[ Id: 8dc4aeb6-5ae2-46a0-948d-287c97dd81fb, Nodes: > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null}dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: > ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null}47dbb8e4-bbde-4164-a798-e47e8c696fb5{ip: 172.24.0.2, host: > ozoneperf_datanode_2.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED] > scm_1 | 2019-11-28 11:22:56,975 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:22:58,018 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: > ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:23:01,871 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:23:02,817 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: > ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} > scm_1 | 2019-11-28 11:23:02,847 INFO pipeline.PipelineReportHandler: > Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by > dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: > ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, > certSerialId: null} {code} > As you can see the pipeline is created but the the cluster is not usable as > it's not yet reporter back by datanode_2: > {code:java} > scm_1 | 2019-11-28 11:23:13,879 WARN block.BlockManagerImpl: Pipeline > creation failed for type:RATIS factor:THREE. Retrying get pipelines c > all once. > scm_1 | > org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot > create pipeline of factor 3 using 0 nodes.{code} > The quick fix is to configure all the compose clusters to wait until one > pipeline is available. This can be done by adjusting the number of the > required datanodes: > {code:java} > // We only care about THREE replica pipeline > int minHealthyPipelines = minDatanodes / > HddsProtos.ReplicationFactor.THREE_VALUE; {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org