[
https://issues.apache.org/jira/browse/HDDS-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anu Engineer resolved HDDS-2646.
--------------------------------
Fix Version/s: 0.5.0
Resolution: Fixed
Committed to master. Thanks for the contribution.
> Start acceptance tests only if at least one THREE pipeline is available
> -----------------------------------------------------------------------
>
> Key: HDDS-2646
> URL: https://issues.apache.org/jira/browse/HDDS-2646
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Marton Elek
> Assignee: Marton Elek
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.5.0
>
> Attachments: docker-ozoneperf-ozoneperf-basic-scm.log
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> After HDDS-2034 (or even before?) pipeline creation (or the status transition
> from ALLOCATE to OPEN) requires at least one pipeline report from all of the
> datanodes. Which means that the cluster might not be usable even if it's out
> from the safe mode AND there are at least three datanodes.
> It makes all the acceptance tests unstable.
> For example in
> [this|https://github.com/apache/hadoop-ozone/pull/263/checks?check_run_id=324489319]
> run.
> {code:java}
> scm_1 | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider:
> Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command
> to datanode 548f146f-2166-440a-b9f1-83086591ae26
> scm_1 | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider:
> Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command
> to datanode dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c
> scm_1 | 2019-11-28 11:22:54,404 INFO pipeline.RatisPipelineProvider:
> Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command
> to datanode 47dbb8e4-bbde-4164-a798-e47e8c696fb5
> scm_1 | 2019-11-28 11:22:54,405 INFO pipeline.PipelineStateManager:
> Created pipeline Pipeline[ Id: 8dc4aeb6-5ae2-46a0-948d-287c97dd81fb, Nodes:
> 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host:
> ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack,
> certSerialId: null}dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host:
> ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack,
> certSerialId: null}47dbb8e4-bbde-4164-a798-e47e8c696fb5{ip: 172.24.0.2, host:
> ozoneperf_datanode_2.ozoneperf_default, networkLocation: /default-rack,
> certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED]
> scm_1 | 2019-11-28 11:22:56,975 INFO pipeline.PipelineReportHandler:
> Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by
> 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host:
> ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack,
> certSerialId: null}
> scm_1 | 2019-11-28 11:22:58,018 INFO pipeline.PipelineReportHandler:
> Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by
> dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host:
> ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack,
> certSerialId: null}
> scm_1 | 2019-11-28 11:23:01,871 INFO pipeline.PipelineReportHandler:
> Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by
> 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host:
> ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack,
> certSerialId: null}
> scm_1 | 2019-11-28 11:23:02,817 INFO pipeline.PipelineReportHandler:
> Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by
> 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host:
> ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack,
> certSerialId: null}
> scm_1 | 2019-11-28 11:23:02,847 INFO pipeline.PipelineReportHandler:
> Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by
> dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host:
> ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack,
> certSerialId: null} {code}
> As you can see the pipeline is created but the the cluster is not usable as
> it's not yet reporter back by datanode_2:
> {code:java}
> scm_1 | 2019-11-28 11:23:13,879 WARN block.BlockManagerImpl: Pipeline
> creation failed for type:RATIS factor:THREE. Retrying get pipelines c
> all once.
> scm_1 |
> org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot
> create pipeline of factor 3 using 0 nodes.{code}
> The quick fix is to configure all the compose clusters to wait until one
> pipeline is available. This can be done by adjusting the number of the
> required datanodes:
> {code:java}
> // We only care about THREE replica pipeline
> int minHealthyPipelines = minDatanodes /
> HddsProtos.ReplicationFactor.THREE_VALUE; {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]