[ 
https://issues.apache.org/jira/browse/HDDS-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-2646:
-----------------------------------
    Labels: pull-request-available  (was: )

> Start acceptance tests only if at least one THREE pipeline is available
> -----------------------------------------------------------------------
>
>                 Key: HDDS-2646
>                 URL: https://issues.apache.org/jira/browse/HDDS-2646
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Marton Elek
>            Assignee: Marton Elek
>            Priority: Blocker
>              Labels: pull-request-available
>         Attachments: docker-ozoneperf-ozoneperf-basic-scm.log
>
>
> After HDDS-2034 (or even before?) pipeline creation (or the status transition 
> from ALLOCATE to OPEN) requires at least one pipeline report from all of the 
> datanodes. Which means that the cluster might not be usable even if it's out 
> from the safe mode AND there are at least three datanodes.
> It makes all the acceptance tests unstable.
> For example in 
> [this|https://github.com/apache/hadoop-ozone/pull/263/checks?check_run_id=324489319]
>  run.
> {code:java}
> scm_1         | 2019-11-28 11:22:54,401 INFO pipeline.RatisPipelineProvider: 
> Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command 
> to datanode 548f146f-2166-440a-b9f1-83086591ae26
> scm_1         | 2019-11-28 11:22:54,402 INFO pipeline.RatisPipelineProvider: 
> Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command 
> to datanode dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c
> scm_1         | 2019-11-28 11:22:54,404 INFO pipeline.RatisPipelineProvider: 
> Send pipeline:PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb create command 
> to datanode 47dbb8e4-bbde-4164-a798-e47e8c696fb5
> scm_1         | 2019-11-28 11:22:54,405 INFO pipeline.PipelineStateManager: 
> Created pipeline Pipeline[ Id: 8dc4aeb6-5ae2-46a0-948d-287c97dd81fb, Nodes: 
> 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: 
> ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, 
> certSerialId: null}dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: 
> ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, 
> certSerialId: null}47dbb8e4-bbde-4164-a798-e47e8c696fb5{ip: 172.24.0.2, host: 
> ozoneperf_datanode_2.ozoneperf_default, networkLocation: /default-rack, 
> certSerialId: null}, Type:RATIS, Factor:THREE, State:ALLOCATED]
> scm_1         | 2019-11-28 11:22:56,975 INFO pipeline.PipelineReportHandler: 
> Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 
> 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: 
> ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, 
> certSerialId: null}
> scm_1         | 2019-11-28 11:22:58,018 INFO pipeline.PipelineReportHandler: 
> Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 
> dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: 
> ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, 
> certSerialId: null}
> scm_1         | 2019-11-28 11:23:01,871 INFO pipeline.PipelineReportHandler: 
> Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 
> 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: 
> ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, 
> certSerialId: null}
> scm_1         | 2019-11-28 11:23:02,817 INFO pipeline.PipelineReportHandler: 
> Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 
> 548f146f-2166-440a-b9f1-83086591ae26{ip: 172.24.0.10, host: 
> ozoneperf_datanode_3.ozoneperf_default, networkLocation: /default-rack, 
> certSerialId: null}
> scm_1         | 2019-11-28 11:23:02,847 INFO pipeline.PipelineReportHandler: 
> Pipeline THREE PipelineID=8dc4aeb6-5ae2-46a0-948d-287c97dd81fb reported by 
> dccee7c4-19b3-41b8-a3f7-b47b0ed45f6c{ip: 172.24.0.5, host: 
> ozoneperf_datanode_1.ozoneperf_default, networkLocation: /default-rack, 
> certSerialId: null} {code}
> As you can see the pipeline is created but the the cluster is not usable as 
> it's not yet reporter back by datanode_2:
> {code:java}
> scm_1         | 2019-11-28 11:23:13,879 WARN block.BlockManagerImpl: Pipeline 
> creation failed for type:RATIS factor:THREE. Retrying get pipelines c
> all once.
> scm_1         | 
> org.apache.hadoop.hdds.scm.pipeline.InsufficientDatanodesException: Cannot 
> create pipeline of factor 3 using 0 nodes.{code}
>  The quick fix is to configure all the compose clusters to wait until one 
> pipeline is available. This can be done by adjusting the number of the 
> required datanodes:
> {code:java}
> // We only care about THREE replica pipeline
> int minHealthyPipelines = minDatanodes /
>     HddsProtos.ReplicationFactor.THREE_VALUE; {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to