[
https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085210#comment-16085210
]
Vrushali C commented on YARN-6733:
----------------------------------
bq. Unlike entity table schema, if user is the preference across clusters then
I think row key should start with subAppUser name
For the entity table, we put the username first since we wanted to ensure
frequent writes by one user go to the same regionserver. That way, some user
who is heavy on writes does not affect another one with less writes. But that
holds good with entities since we would write a lot of entities.
As such, cluster ! user prefix seems more appropriate for nesting. For sub
application entities, I believe cluster!user would be a good prefix.
bq. IIRC, we have entity type and entity id to distinguish between the entity
so sub app name not required right? Am I missing anything?
Hmm, so here is how I see this. Let's says a user is running a particular query
again and again on this tez setup. Each time the query is run, it will write
things to atsv2. Let's call that query "queryA". Each run of this query would
(should) generate a different entity id.
For example, to store the status of this query, I think the key would then be
cluster ! sub app user ! queryA ! DAG ! 1234 ! queryA_1499924426
with column name "status" and perhaps value of "SUCCESS".
Say next time it runs, they can write
cluster ! sub app user ! queryA ! DAG ! 1234 ! queryA_1450024426
Hence the sub app name and entity id. What do you think? I could remove the
sub-app name and keep only entity id but each time they run this query, the
framework has to anyways generate a new entity id, so the row key I am
proposing will give them a way to look at different entities within the same
sub app.
But now I am wondering if we should make it simple and keep only the entity id
and not have any sub app name.
> Add table for storing sub-application entities
> ----------------------------------------------
>
> Key: YARN-6733
> URL: https://issues.apache.org/jira/browse/YARN-6733
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Vrushali C
> Assignee: Vrushali C
> Attachments: IMG_7040.JPG, YARN-6733-YARN-5355.001.patch
>
>
> After a discussion with Tez folks, we have been thinking over introducing a
> table to store sub-application information.
> For example, if a Tez session runs for a certain period as User X and runs a
> few AMs. These AMs accept DAGs from other users. Tez will execute these dags
> with a doAs user. ATSv2 should store this information in a new table perhaps
> called as "sub_application" table.
> This jira tracks the code changes needed for table schema creation.
> I will file other jiras for writing to that table, updating the user name
> fields to include sub-application user etc.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]