[
https://issues.apache.org/jira/browse/YARN-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524746#comment-15524746
]
Sangjin Lee commented on YARN-5667:
-----------------------------------
Those are great questions.
The diamond dependency (where there are more than one version of a given
library in the dependency graph) happens because the hadoop code uses
hadoop-common 3.0.0-alpha1 directly for example, and also 2.5.1 via indirect
dependency via hbase 1.1.3. Due to hadoop's version management, 3.0.0-alpha1 is
picked. The implication of this is that we build and test hbase code in the
context of timeline service *diffrent than* the declared hbase's hadoop
dependencies.
Now if we think about hbase client code and hbase coprocessor code separately,
we see that the runtime for both pieces of code is different. The code that
uses hbase client runs on YARN (and therefore hadoop 3.0.0). In that
environment, we need to ensure the hbase client itself (not our code that uses
hbase client) works correctly against the trunk version of hadoop.
On the other hand, the hbase coprocessor code runs on hbase. Therefore, it is
now timeline service coprocessor code that needs to run under hadoop 2.5.1
(until/unless we upgrade hbase). These both aspects need to be verified if we
decide to split the code into separate modules. That would be made easier by
having them in separate modules.
If we have an hbase version that depends on the trunk, these problems would go
away. And I understand that the hbase folks are making effort to ensure the
latest hbase version works against the hadoop trunk version. That said, hbase
officially can depend only on released versions, and there will always be lags.
As for the reason that the coprocessor depends on the hbase-client-related
code, there is no strong reason that should be the case. It's just the way the
code evolved. Actually it would be good to refactor the code so that the
coprocessor code has minimal dependencies. It's worth looking into.
> Move HBase backend code in ATS v2 into its separate module
> -----------------------------------------------------------
>
> Key: YARN-5667
> URL: https://issues.apache.org/jira/browse/YARN-5667
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Reporter: Haibo Chen
> Assignee: Haibo Chen
>
> The HBase backend code currently lives along with the core ATS v2 code in
> hadoop-yarn-server-timelineservice module. Because Resource Manager depends
> on hadoop-yarn-server-timelineservice, an unnecessary dependency of the RM
> module on HBase modules is introduced (HBase backend is pluggable, so we do
> not need to directly pull in HBase jars).
> In our internal effort to try ATS v2 with HBase 2.0 which depends on Hadoop
> 3, we encountered a circular dependency during our builds between HBase2.0
> and Hadoop3 artifacts.
> {code}
> hadoop-mapreduce-client-common, hadoop-yarn-client,
> hadoop-yarn-server-resourcemanager, hadoop-yarn-server-timelineservice,
> hbase-server, hbase-prefix-tree, hbase-hadoop2-compat,
> hadoop-mapreduce-client-jobclient, hadoop-mapreduce-client-common]
> {code}
> This jira proposes we move all HBase-backend-related code from
> hadoop-yarn-server-timelineservice into its own module (possible name is
> yarn-server-timelineservice-storage) so that core RM modules do not depend on
> HBase modules any more.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]