[
https://issues.apache.org/jira/browse/YARN-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
aimahou updated YARN-10298:
---------------------------
Description:
h2. Issue
TimeLine entity information only stored in one region when use apache HBase as
backend storage
h2. Probable cause
We found in the source code that the rowKey is composed of
clusterId、userId、flowName、flowRunId and appId when hbase timeline writer stores
timeline entity info,which probably cause the rowKey is sorted by dictionary
order. Thus timeline entity may only store in one region or few adjacent
regions.
h2. Related code snippet
HBaseTimelineWriterImpl.java
{quote}
{code:java}
public TimelineWriteResponse write(TimelineCollectorContext context,
TimelineEntities data, UserGroupInformation callerUgi)
throws IOException {
...
boolean isApplication = ApplicationEntity.isApplicationEntity(te);
byte[] rowKey;
if (isApplication){
ApplicationRowKey applicationRowKey = new ApplicationRowKey(clusterId, userId,
flowName, flowRunId, appId); rowKey = applicationRowKey.getRowKey();
store(rowKey, te, flowVersion, Tables.APPLICATION_TABLE);
}else {
EntityRowKey entityRowKey = new EntityRowKey(clusterId, userId, flowName,
flowRunId, appId, te.getType(), te.getIdPrefix(), te.getId());
rowKey = entityRowKey.getRowKey();
store(rowKey, te, flowVersion, Tables.ENTITY_TABLE);
}
if (!isApplication && SubApplicationEntity.isSubApplicationEntity(te)) {
SubApplicationRowKey subApplicationRowKey = new
SubApplicationRowKey(subApplicationUser, clusterId, te.getType(),
te.getIdPrefix(), te.getId(), userId);
rowKey = subApplicationRowKey.getRowKey();
store(rowKey, te, flowVersion, Tables.SUBAPPLICATION_TABLE); }
...
}
{code}
{quote}
h2. Suggestion
We can use the hash code of original rowKey as the rowKey to store and read
timeline entity data.
was:
h2. Issue
TimeLine entity information only stored in one region when use apache HBase as
backend storage
h2. Probable cause
We found in the source code that the rowKey is composed of
clusterId、userId、flowName、flowRunId and appId when hbase timeline writer stores
timeline entity info,which probably cause the rowKey is sorted by dictionary
order. Thus timeline entity may only store in one region or few adjacent
regions.
h2. Related code snippet
HBaseTimelineWriterImpl.java
{quote}
{code:java}
else
public TimelineWriteResponse write(TimelineCollectorContext context,
TimelineEntities data, UserGroupInformation callerUgi)
throws IOException {
...
boolean isApplication = ApplicationEntity.isApplicationEntity(te);
byte[] rowKey;
if (isApplication){
ApplicationRowKey applicationRowKey = new ApplicationRowKey(clusterId, userId,
flowName, flowRunId, appId); rowKey = applicationRowKey.getRowKey();
store(rowKey, te, flowVersion, Tables.APPLICATION_TABLE);
}else {
EntityRowKey entityRowKey = new EntityRowKey(clusterId, userId, flowName,
flowRunId, appId, te.getType(), te.getIdPrefix(), te.getId());
rowKey = entityRowKey.getRowKey();
store(rowKey, te, flowVersion, Tables.ENTITY_TABLE);
}
if (!isApplication && SubApplicationEntity.isSubApplicationEntity(te)) {
SubApplicationRowKey subApplicationRowKey = new
SubApplicationRowKey(subApplicationUser, clusterId, te.getType(),
te.getIdPrefix(), te.getId(), userId);
rowKey = subApplicationRowKey.getRowKey();
store(rowKey, te, flowVersion, Tables.SUBAPPLICATION_TABLE); }
...
}
{code}
{quote}
h2. Suggestion
We can use the hash code of original rowKey as the rowKey to store and read
timeline entity data.
> TimeLine entity information only stored in one region when use apache HBase
> as backend storage
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-10298
> URL: https://issues.apache.org/jira/browse/YARN-10298
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: ATSv2, timelineservice
> Affects Versions: 3.1.1
> Reporter: aimahou
> Priority: Major
>
> h2. Issue
> TimeLine entity information only stored in one region when use apache HBase
> as backend storage
> h2. Probable cause
> We found in the source code that the rowKey is composed of
> clusterId、userId、flowName、flowRunId and appId when hbase timeline writer
> stores timeline entity info,which probably cause the rowKey is sorted by
> dictionary order. Thus timeline entity may only store in one region or few
> adjacent regions.
> h2. Related code snippet
> HBaseTimelineWriterImpl.java
> {quote}
> {code:java}
> public TimelineWriteResponse write(TimelineCollectorContext context,
> TimelineEntities data, UserGroupInformation callerUgi)
> throws IOException {
> ...
> boolean isApplication = ApplicationEntity.isApplicationEntity(te);
> byte[] rowKey;
> if (isApplication){
> ApplicationRowKey applicationRowKey = new ApplicationRowKey(clusterId,
> userId, flowName, flowRunId, appId); rowKey = applicationRowKey.getRowKey();
> store(rowKey, te, flowVersion, Tables.APPLICATION_TABLE);
> }else {
> EntityRowKey entityRowKey = new EntityRowKey(clusterId, userId, flowName,
> flowRunId, appId, te.getType(), te.getIdPrefix(), te.getId());
> rowKey = entityRowKey.getRowKey();
> store(rowKey, te, flowVersion, Tables.ENTITY_TABLE);
> }
> if (!isApplication && SubApplicationEntity.isSubApplicationEntity(te)) {
> SubApplicationRowKey subApplicationRowKey = new
> SubApplicationRowKey(subApplicationUser, clusterId, te.getType(),
> te.getIdPrefix(), te.getId(), userId);
> rowKey = subApplicationRowKey.getRowKey();
> store(rowKey, te, flowVersion, Tables.SUBAPPLICATION_TABLE); }
> ...
> }
> {code}
>
> {quote}
> h2. Suggestion
> We can use the hash code of original rowKey as the rowKey to store and read
> timeline entity data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]