[ 
https://issues.apache.org/jira/browse/YARN-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820063#comment-13820063
 ] 

Karthik Kambatla commented on YARN-1390:
----------------------------------------

bq. Just to clarify, are you proposing a new field for Application that would 
be a key-value map and would be used to store tags, applicationLineage, etc?
In the long term, yes. In the short-term, having a field with a single value 
should suffice.

bq. Are you assuming the source info is just a simple well defined string such 
as "Oozie" or would Oozie do something like "Oozie:workflowId=1234" ?
We plan on using the Oozie-action-id; so, it is *not* a well-defined string. 
Let me explain the usecase in detail.

In YARN, a node failure can result in the failure of a subset of current AMs. 
In case of Oozie, if the Oozie launcher-AM fails and the action-AM doesn't, 
re-spawning the launcher-AM can result in two copies of the action-AM 
potentially leading to correctness issues. So, the plan is for the launcher AM 
to kill previously running action-AMs (if any) before starting new action-AMs. 
We need the lineage information to figure out the action-AMs the launcher 
started. 

bq. Also, from an implementation point of view, I would assume this map would 
be not be searchable.
Searchability can be of two types. Which one do you think we should avoid? 
# The internal RM data-structures using this "map" to "index" app-data. This 
would help in serving RM java/REST API queries faster. This comes with the 
overhead of maintaining these indices etc. I am not actively thinking about 
this; just a thought that crossed my mind.
# Allow querying for apps matching a particular tag (or Oozie-Action-Id) via 
filtering in the RM. While it might be okay to not support this in the 
first-cut, I am afraid this is something we should probably support. Otherwise, 
the client (Oozie) will end up asking for all the applications (in the time 
frame) and sift through them only to discard the remaining. 

> Provide a way to capture source of an application to be queried through REST 
> or Java Client APIs
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1390
>                 URL: https://issues.apache.org/jira/browse/YARN-1390
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: api
>    Affects Versions: 2.2.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>
> In addition to other fields like application-type (added in YARN-563), it is 
> useful to have an applicationSource field to track the source of an 
> application. The application source can be useful in (1) fetching only those 
> applications a user is interested in, (2) potentially adding source-specific 
> optimizations in the future. 
> Examples of sources are: User-defined project names, Pig, Hive, Oozie, Sqoop 
> etc.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to