[
https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303471#comment-14303471
]
Jason Lowe commented on YARN-1778:
----------------------------------
Thanks for the analysis and patch, [~zxu]! I'm wondering if the test is trying
to tell us there really is a problem with FSRMStateStore retries, and therefore
fixing the test is actually masking a real problem that needs to be fixed in
the main code. If I understand the intent of the test correctly, it's trying
to verify that FSRMStateStore will not throw an exception while namenodes are
down or coming back up. However if we make the test wait until the namenodes
are back up before trying to connect then that defeats most of the point of the
test.
I think the critical question is: should the "Namenode still not started"
exception be retried by either the DFSClient layer or by FSRMStateStore? I
think it should, otherwise a client of FSRMStateStore is going to see this
exception in a similar, real-world scenario where the Namenode was restarted
and wonder why the framework didn't auto-retry.
> TestFSRMStateStore fails on trunk
> ---------------------------------
>
> Key: YARN-1778
> URL: https://issues.apache.org/jira/browse/YARN-1778
> Project: Hadoop YARN
> Issue Type: Test
> Reporter: Xuan Gong
> Assignee: zhihai xu
> Attachments: YARN-1778.000.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)