[ 
https://issues.apache.org/jira/browse/YARN-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788657#comment-16788657
 ] 

Prabhu Joseph commented on YARN-9303:
-------------------------------------

1. Removed the {{userName}} Pre-Splits from {{timelineservice.app_flow}} hbase 
table and left with Auto-Splits

*Reason:*
The current rowkey starts with inverted timestamp (eg: 9999, 9998, 9997,,,,) 
from {{application_id}} for which the pre-splits cannot help. A hash value 
before the rowkey can prevent hotspotting but which will require a complex 
logic and a sort during fetch to display the apps ordered. This table won't 
have much load as we insert a row for every app submission, so HotSpotting 
won't be much a problem. 
Auto Splitting should be good. {{flowRun}} and {{flowActivity}} also uses the 
Auto Splits.

2. Removed {{KeyPrefixRegionSplitPolicy}} as the table does not have any groups 
based on prefix. The default {{IncreasingToUpperBoundRegionSplitPolicy}} will 
work fine.

Have reviewed other 6 tables which are looking fine except one problem when 
{{cluster_id}} for {{application, domain}} table does not start with lower-case 
letter as the pre-splits logic is based on lower-case. Allowing user to 
configure the pre-splits based on their {{cluster_id}} and {{userName}} will 
fix this issue. Reported YARN-9373 for the same. 

[~rohithsharma], [~vrushalic] Can you review this Jira when you get time.




> Username splits won't help timelineservice.app_flow table
> ---------------------------------------------------------
>
>                 Key: YARN-9303
>                 URL: https://issues.apache.org/jira/browse/YARN-9303
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.1.2
>            Reporter: Prabhu Joseph
>            Assignee: Prabhu Joseph
>            Priority: Major
>         Attachments: Only_Last_Region_Used.png, YARN-9303-001.patch
>
>
> timelineservice.app_flow hbase table uses pre split logic based on username 
> whereas the rowkeys starts with inverted timestamp (Long.MAX_VALUE - ts). All 
> data will go to the last region and remaining regions will never be inserted. 
> Need to choose right split or use auto-split.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to