[jira] [Commented] (YARN-6509) Add a size threshold beyond which yarn logs will require a force option
[ https://issues.apache.org/jira/browse/YARN-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981703#comment-15981703 ] Siddharth Seth commented on YARN-6509: -- Is the current proposal to change the default to fetch the last 4K ? Can we please not make this change. It is definitely incompatible, and I'd argue that it's not very useful. The intent of the jira is to protect users against log downloads which could otherwise take hours and fill up the local fs - apps which generate large logs. > Add a size threshold beyond which yarn logs will require a force option > --- > > Key: YARN-6509 > URL: https://issues.apache.org/jira/browse/YARN-6509 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Siddharth Seth >Assignee: Xuan Gong > Fix For: 2.9.0 > > Attachments: YARN-6509.1.patch > > > An accidental fetch for a long running application can lead to scenario which > the large size of log can fill up a disk. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-3427) Remove deprecated methods from ResourceCalculatorProcessTree
[ https://issues.apache.org/jira/browse/YARN-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947786#comment-15947786 ] Siddharth Seth commented on YARN-3427: -- >From a Tez perspective, would prefer if the methods were left in place. If >this was something that was fixed in 2.6, that would have been easier to work >with. Since the new methods were in 2.7 - Tez will need to introduce a shim >for this. > Remove deprecated methods from ResourceCalculatorProcessTree > > > Key: YARN-3427 > URL: https://issues.apache.org/jira/browse/YARN-3427 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-3427.000.patch, YARN-3427.001.patch > > > In 2.7, we made ResourceCalculatorProcessTree Public and exposed some > existing ill-formed methods as deprecated ones for use by Tez. > We should remove it in 3.0.0, considering that the methods have been > deprecated for the all 2.x.y releases that it is marked Public in. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5738) Allow services to release/kill specific containers
Siddharth Seth created YARN-5738: Summary: Allow services to release/kill specific containers Key: YARN-5738 URL: https://issues.apache.org/jira/browse/YARN-5738 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth There are occasions on which specific containers may not be required by a service. Would be useful to have support to return these to YARN. Slider flex doesn't give this control. cc [~gsaha], [~vinodkv] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5418) When partial log aggregation is enabled, display the list of aggregated files on the container log page
Siddharth Seth created YARN-5418: Summary: When partial log aggregation is enabled, display the list of aggregated files on the container log page Key: YARN-5418 URL: https://issues.apache.org/jira/browse/YARN-5418 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth The container log pages lists all files. However, as soon as a file gets aggregated - it's no longer available on this listing page. It will be useful to list aggregated files as well as the current set of files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5297) Avoid printing a stack trace when recovering an app after the RM restarts
Siddharth Seth created YARN-5297: Summary: Avoid printing a stack trace when recovering an app after the RM restarts Key: YARN-5297 URL: https://issues.apache.org/jira/browse/YARN-5297 Project: Hadoop YARN Issue Type: Task Reporter: Siddharth Seth The exception trace is unnecessary, and can cause confusion. {code} 2016-06-16 22:02:54,262 INFO ipc.Server (Server.java:logException(2401)) - IPC Server handler 0 on 8030, call org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 172.22.79.149:42698 Call#2241 Retry#0 org.apache.hadoop.yarn.exceptions.ApplicationMasterNotRegisteredException: AM is not registered for known application attempt: appattempt_1466112179488_0001_01 or RM had restarted after AM registered . AM should re-register. at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:454) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307) {code} cc [~djp] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5270) Solve miscellaneous issues caused by YARN-4844
[ https://issues.apache.org/jira/browse/YARN-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340336#comment-15340336 ] Siddharth Seth commented on YARN-5270: -- Not sure why the NotImplementedYet exceptions are required. Is this to handle cases where some projects may have implemented Resource ? Anyway - if the exception has to stay - the message should be better to avoid confusion. Indicate that this is implemented in the actual implementation. > Solve miscellaneous issues caused by YARN-4844 > -- > > Key: YARN-5270 > URL: https://issues.apache.org/jira/browse/YARN-5270 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-5270-branch-2.001.patch, > YARN-5270-branch-2.8.001.patch > > > Such as javac warnings reported by YARN-5077 and type converting issues in > Resources class. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5224) Logs for a completed container are not available in the yarn logs output for a live application
Siddharth Seth created YARN-5224: Summary: Logs for a completed container are not available in the yarn logs output for a live application Key: YARN-5224 URL: https://issues.apache.org/jira/browse/YARN-5224 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.9.0 Reporter: Siddharth Seth This affects 'short' jobs like MapReduce and Tez more than long running apps. Related: YARN-5193 (but that only covers long running apps) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5205) yarn logs for live applications does not provide log files which may have already been aggregated
Siddharth Seth created YARN-5205: Summary: yarn logs for live applications does not provide log files which may have already been aggregated Key: YARN-5205 URL: https://issues.apache.org/jira/browse/YARN-5205 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.9.0 Reporter: Siddharth Seth With periodic aggregation enabled, the logs which have been partially aggregated are not always displayed by the yarn logs command. If the file exists in the log dir for a container - all previously aggregated files with the same name, along with the current file will be part of the yarn log output. Files which have been previously aggregated, for which a file with the same name does not exists in the container log dir do not show up in the output. After the app completes, all logs are available. cc [~xgong] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5194) Avoid adding yarn-site to all Configuration instances created by the JVM
[ https://issues.apache.org/jira/browse/YARN-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316909#comment-15316909 ] Siddharth Seth commented on YARN-5194: -- This will likely break a bunch of things - hence targeted at 3.0. Could you please elaborate on HDFS getConf ? If there's enough interest to reduce the size of config objects in memory / serialized size - this can be taken up for a 3.x release. > Avoid adding yarn-site to all Configuration instances created by the JVM > > > Key: YARN-5194 > URL: https://issues.apache.org/jira/browse/YARN-5194 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Seth > > {code} > static { > addDeprecatedKeys(); > Configuration.addDefaultResource(YARN_DEFAULT_CONFIGURATION_FILE); > Configuration.addDefaultResource(YARN_SITE_CONFIGURATION_FILE); > } > {code} > This puts the contents of yarn-default and yarn-site into every configuration > instance created in the VM after YarnConfiguration has been initialized. > This should be changed to a local addResource for the specific > YarnConfiguration instance, instead of polluting every Configuration instance. > Incompatible change. Have set the target version to 3.x. > The same applies to HdfsConfiguration (hdfs-site.xml), and Configuration > (core-site.xml etc). > core-site may be worth including everywhere, however it would be better to > expect users to explicitly add the relevant resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5193) For long running services, aggregate logs when a container completes instead of when the app completes
[ https://issues.apache.org/jira/browse/YARN-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313121#comment-15313121 ] Siddharth Seth commented on YARN-5193: -- bq. I don't think long-running necessarily means low container churn, although I'm sure it does for the use-case you have in mind. For example, an app-as-service that farms out work as containers on YARN and runs forever. High load with short work duration for such a service = high container churn but it never exits. Fair point. I'm guessing this would end up getting implemented as a parameter in the API, rather than a blanket 'long-running=aggregate after container complete' bq. Periodic aggregation would be more palatable for such a use-case. Also log-aggregation duration is not guaranteed. Even if we aggregate as the container completes there's no guarantee how long it will take, so any client that wants to see the logs in HDFS just as containers complete has to handle fetching it from the nodes in the worst-case scenario or retrying until it's available. There would definitely still be the time window where the container has completed, and the log hasn't yet been aggregated. It'll likely be a little shorter than a specific time window - if that's worth anything. The main problem seems to be discovering these dead containers, and where they ran. ATS/AHS would have been ideal, but can't really be enabled on a reasonably sized cluster to log container information. Maybe log-aggregation can write out indexing information up front - so that the CLI can at least find all containers / the node where containers ran. > For long running services, aggregate logs when a container completes instead > of when the app completes > -- > > Key: YARN-5193 > URL: https://issues.apache.org/jira/browse/YARN-5193 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Seth > > For a long running service, containers will typically not complete very > often. However, when a container completes - it would be useful to aggregate > the logs right then, instead of waiting for the app to complete. > This will allow the command line log tool to lookup containers for an app > from the log file index itself, instead of having to go and talk to YARN. > Talking to YARN really only works if ATS is enabled, and YARN is configured > to publish container information to ATS (That may not always be the case - > since this can overload ATS quite fast). > There's some added benefits like cleaning out local disk space early, instead > of waiting till the app completes. (There's probably a separate jira > somewhere about cleanup of container for long running services anyway) > cc [~vinodkv], [~xgong] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5193) For long running services, aggregate logs when a container completes instead of when the app completes
[ https://issues.apache.org/jira/browse/YARN-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312822#comment-15312822 ] Siddharth Seth commented on YARN-5193: -- Log rolling should help. I'm yet to try it out. Do you happen to know how it works when a container dies - will the logs be aggregated immediately, or after the time window. bq. Main thing to watch out for here is additional load to the namenode. Yes. The original change to aggregate at the end was required for shorter running jobs with more container churn. For a longer running service - containers will likely not go down very often, and it should be oK to upload logs occasionally (without keeping connections open). > For long running services, aggregate logs when a container completes instead > of when the app completes > -- > > Key: YARN-5193 > URL: https://issues.apache.org/jira/browse/YARN-5193 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Siddharth Seth > > For a long running service, containers will typically not complete very > often. However, when a container completes - it would be useful to aggregate > the logs right then, instead of waiting for the app to complete. > This will allow the command line log tool to lookup containers for an app > from the log file index itself, instead of having to go and talk to YARN. > Talking to YARN really only works if ATS is enabled, and YARN is configured > to publish container information to ATS (That may not always be the case - > since this can overload ATS quite fast). > There's some added benefits like cleaning out local disk space early, instead > of waiting till the app completes. (There's probably a separate jira > somewhere about cleanup of container for long running services anyway) > cc [~vinodkv], [~xgong] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5194) Avoid adding yarn-site to all Configuration instances created by the JVM
Siddharth Seth created YARN-5194: Summary: Avoid adding yarn-site to all Configuration instances created by the JVM Key: YARN-5194 URL: https://issues.apache.org/jira/browse/YARN-5194 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth {code} static { addDeprecatedKeys(); Configuration.addDefaultResource(YARN_DEFAULT_CONFIGURATION_FILE); Configuration.addDefaultResource(YARN_SITE_CONFIGURATION_FILE); } {code} This puts the contents of yarn-default and yarn-site into every configuration instance created in the VM after YarnConfiguration has been initialized. This should be changed to a local addResource for the specific YarnConfiguration instance, instead of polluting every Configuration instance. Incompatible change. Have set the target version to 3.x. The same applies to HdfsConfiguration (hdfs-site.xml), and Configuration (core-site.xml etc). core-site may be worth including everywhere, however it would be better to expect users to explicitly add the relevant resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5193) For long running services, aggregate logs when a container completes instead of when the app completes
Siddharth Seth created YARN-5193: Summary: For long running services, aggregate logs when a container completes instead of when the app completes Key: YARN-5193 URL: https://issues.apache.org/jira/browse/YARN-5193 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth For a long running service, containers will typically not complete very often. However, when a container completes - it would be useful to aggregate the logs right then, instead of waiting for the app to complete. This will allow the command line log tool to lookup containers for an app from the log file index itself, instead of having to go and talk to YARN. Talking to YARN really only works if ATS is enabled, and YARN is configured to publish container information to ATS (That may not always be the case - since this can overload ATS quite fast). There's some added benefits like cleaning out local disk space early, instead of waiting till the app completes. (There's probably a separate jira somewhere about cleanup of container for long running services anyway) cc [~vinodkv], [~xgong] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-4816) SystemClock API broken in 2.9.0
[ https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned YARN-4816: Assignee: Siddharth Seth > SystemClock API broken in 2.9.0 > --- > > Key: YARN-4816 > URL: https://issues.apache.org/jira/browse/YARN-4816 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: YARN-4816.1.txt > > > https://issues.apache.org/jira/browse/YARN-4526 removed the public > constructor on SystemClock - making it an incompatible change. > cc [~kasha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4816) SystemClock API broken in 2.9.0
[ https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194673#comment-15194673 ] Siddharth Seth commented on YARN-4816: -- Thanks for the review [~kasha] - committing to master and branch-2. > SystemClock API broken in 2.9.0 > --- > > Key: YARN-4816 > URL: https://issues.apache.org/jira/browse/YARN-4816 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Siddharth Seth > Attachments: YARN-4816.1.txt > > > https://issues.apache.org/jira/browse/YARN-4526 removed the public > constructor on SystemClock - making it an incompatible change. > cc [~kasha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4816) SystemClock API broken in 2.9.0
[ https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-4816: - Attachment: YARN-4816.1.txt Trivial patch. Re-introduces the public constructor and marks it as deprecated. [~kasha] - please review. > SystemClock API broken in 2.9.0 > --- > > Key: YARN-4816 > URL: https://issues.apache.org/jira/browse/YARN-4816 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Siddharth Seth > Attachments: YARN-4816.1.txt > > > https://issues.apache.org/jira/browse/YARN-4526 removed the public > constructor on SystemClock - making it an incompatible change. > cc [~kasha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4816) SystemClock API broken in 2.9.0
Siddharth Seth created YARN-4816: Summary: SystemClock API broken in 2.9.0 Key: YARN-4816 URL: https://issues.apache.org/jira/browse/YARN-4816 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.9.0 Reporter: Siddharth Seth https://issues.apache.org/jira/browse/YARN-4526 removed the public constructor on SystemClock - making it an incompatible change. cc [~kasha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4554) ApplicationReport.getDiagnostics does not return diagnostics from individual attempts
[ https://issues.apache.org/jira/browse/YARN-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089794#comment-15089794 ] Siddharth Seth commented on YARN-4554: -- Please go ahead. Separating the diagnostics per appAttempt in the main report would be useful. Something like "[appAttempt1=...], [appAttempt2=...]" > ApplicationReport.getDiagnostics does not return diagnostics from individual > attempts > - > > Key: YARN-4554 > URL: https://issues.apache.org/jira/browse/YARN-4554 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Siddharth Seth >Assignee: Sunil G > > For an Application with ApplicationReport.getFinalApplicationStatus=FAILED > and ApplicationReport.getYarnApplicationState=FINISHED - > ApplicationReport.getDiagnostics returns an empty string. > Instead I had to use ApplicationReport.getCurrentApplicationAttemptId, > followed by getApplicationAttemptReport to get diagnostics for the attempt - > which contained the information I had used to unregister the app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4554) ApplicationReport.getDiagnostics does not return diagnostics from individual attempts
Siddharth Seth created YARN-4554: Summary: ApplicationReport.getDiagnostics does not return diagnostics from individual attempts Key: YARN-4554 URL: https://issues.apache.org/jira/browse/YARN-4554 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Siddharth Seth For an Application with ApplicationReport.getFinalApplicationStatus=FAILED and ApplicationReport.getYarnApplicationState=FINISHED - ApplicationReport.getDiagnostics returns an empty string. Instead I had to use ApplicationReport.getCurrentApplicationAttemptId, followed by getApplicationAttemptReport to get diagnostics for the attempt - which contained the information I had used to unregister the app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status
[ https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060611#comment-15060611 ] Siddharth Seth commented on YARN-4207: -- +1. This looks good. Thanks [~rhaase] > Add a non-judgemental YARN app completion status > > > Key: YARN-4207 > URL: https://issues.apache.org/jira/browse/YARN-4207 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Rich Haase > Labels: trivial > Attachments: YARN-4207.patch > > > For certain applications, it doesn't make sense to have SUCCEEDED or FAILED > end state. For example, Tez sessions may include multiple DAGs, some of which > have succeeded and some have failed; there's no clear status for the session > both logically and from user perspective (users are confused either way). > There needs to be a status not implying success or failure, such as > "done"/"ended"/"finished". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status
[ https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955334#comment-14955334 ] Siddharth Seth commented on YARN-4207: -- [~rhaase] - thanks for taking this up. Along with the change to FinalApplicationStatus, a change is also required to the proto definition (yarn_protos.proto). There'll be a set of converter methods which translate between the proto and FinalApplicationStatus which will also need to be changed. Other than that, I believe adding this additional value is a safe change. > Add a non-judgemental YARN app completion status > > > Key: YARN-4207 > URL: https://issues.apache.org/jira/browse/YARN-4207 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sergey Shelukhin > Labels: trivial > Attachments: YARN-4207.patch > > > For certain applications, it doesn't make sense to have SUCCEEDED or FAILED > end state. For example, Tez sessions may include multiple DAGs, some of which > have succeeded and some have failed; there's no clear status for the session > both logically and from user perspective (users are confused either way). > There needs to be a status not implying success or failure, such as > "done"/"ended"/"finished". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4207) Add a non-judgemental YARN app completion status
[ https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-4207: - Assignee: Rich Haase > Add a non-judgemental YARN app completion status > > > Key: YARN-4207 > URL: https://issues.apache.org/jira/browse/YARN-4207 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Rich Haase > Labels: trivial > Attachments: YARN-4207.patch > > > For certain applications, it doesn't make sense to have SUCCEEDED or FAILED > end state. For example, Tez sessions may include multiple DAGs, some of which > have succeeded and some have failed; there's no clear status for the session > both logically and from user perspective (users are confused either way). > There needs to be a status not implying success or failure, such as > "done"/"ended"/"finished". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-162) nodemanager log aggregation has scaling issues with namenode
[ https://issues.apache.org/jira/browse/YARN-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933751#comment-14933751 ] Siddharth Seth commented on YARN-162: - Go ahead. > nodemanager log aggregation has scaling issues with namenode > > > Key: YARN-162 > URL: https://issues.apache.org/jira/browse/YARN-162 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 0.23.3 >Reporter: Nathan Roberts >Assignee: Siddharth Seth >Priority: Critical > Attachments: YARN-162.txt, YARN-162_WIP.txt, YARN-162_v2.txt, > YARN-162_v2.txt > > > Log aggregation causes fd explosion on the namenode. On large clusters this > can exhaust FDs to the point where datanodes can't check-in. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4208) Support additional values for FinalApplicationStatus
[ https://issues.apache.org/jira/browse/YARN-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved YARN-4208. -- Resolution: Duplicate Looks like [~sershe] already filed YARN-4207. Closing this as a dupe. > Support additional values for FinalApplicationStatus > > > Key: YARN-4208 > URL: https://issues.apache.org/jira/browse/YARN-4208 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Siddharth Seth > > FinalApplicationStatus currently supports SUCCEEDED, FAILED and KILLED as > values after an application completes. > While these are sufficient for jobs like MR where a single job maps to a > single job, these values are not very useful for longer running applications. > It does actually lead to confusion when users end up interpreting this value > as the exit status of a job which may be one of many running as part of a > single application. > A more generic FinalAppStatus status such as 'COMPLETED' would be useful to > have. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4207) Add a non-judgemental YARN app completion status
[ https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-4207: - Issue Type: Improvement (was: Bug) > Add a non-judgemental YARN app completion status > > > Key: YARN-4207 > URL: https://issues.apache.org/jira/browse/YARN-4207 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sergey Shelukhin > > For certain applications, it doesn't make sense to have SUCCEEDED or FAILED > end state. For example, Tez sessions may include multiple DAGs, some of which > have succeeded and some have failed; there's no clear status for the session > both logically and from user perspective (users are confused either way). > There needs to be a status not implying success or failure, such as > "done"/"ended"/"finished". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4208) Support additional values for FinalApplicationStatus
Siddharth Seth created YARN-4208: Summary: Support additional values for FinalApplicationStatus Key: YARN-4208 URL: https://issues.apache.org/jira/browse/YARN-4208 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Siddharth Seth FinalApplicationStatus currently supports SUCCEEDED, FAILED and KILLED as values after an application completes. While these are sufficient for jobs like MR where a single job maps to a single job, these values are not very useful for longer running applications. It does actually lead to confusion when users end up interpreting this value as the exit status of a job which may be one of many running as part of a single application. A more generic FinalAppStatus status such as 'COMPLETED' would be useful to have. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status
[ https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907140#comment-14907140 ] Siddharth Seth commented on YARN-4207: -- cc [~vinodkv] > Add a non-judgemental YARN app completion status > > > Key: YARN-4207 > URL: https://issues.apache.org/jira/browse/YARN-4207 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Sergey Shelukhin > > For certain applications, it doesn't make sense to have SUCCEEDED or FAILED > end state. For example, Tez sessions may include multiple DAGs, some of which > have succeeded and some have failed; there's no clear status for the session > both logically and from user perspective (users are confused either way). > There needs to be a status not implying success or failure, such as > "done"/"ended"/"finished". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588847#comment-14588847 ] Siddharth Seth commented on YARN-1197: -- bq. I would argue that waiting for an NM-RM heartbeat is much worse than waiting for an AM-RM heartbeat. With continuous scheduling, the RM can make decisions in millisecond time, and the AM can regulate its heartbeats according to the application's needs to get fast responses. If an NM-RM heartbeat is involved, the application is at the mercy of the cluster settings, which should be in the multi-second range for large clusters. I tend to agree with Sandy's arguments about option a being better in terms of latency - and that we shouldn't be architecting this in a manner which would limit it to the seconds range rather than milliseconds / hundreds of milliseconds when possible. It's already possible to get fast allocations - low 100s of milliseconds via a scheduler loop which is delinked from NM heartbeats and a variable AM-RM heartbeat interval, which is under user control rather than being a cluster property. There are going to be improvements to the performance of various protocols in YARN. HADOOP-11552 opens up one such option which allows AMs to know about allocations as soon as the scheduler has the made the decision, without a requirement to poll. Of-course - there's plenty of work to be done before that can actually be used :) That said, callbacks on the RPC can be applied at various levels - including NM-RM communication, which can make option b work fast as well. However, it will incur the cost of additional RPC roundtrips. Option a, however, can be fast from the get go with tuning, and also gets better with future enhancements. I don't think it's possible for the AM to start using the additional allocation till the NM has updated all it's state - including writing out recovery information for work preserving restart (Thanks Vinod for pointing this out). Seems like that poll/callback will be required - unless the plan is to route this information via the RM. Support changing resources of an allocated container Key: YARN-1197 URL: https://issues.apache.org/jira/browse/YARN-1197 Project: Hadoop YARN Issue Type: Task Components: api, nodemanager, resourcemanager Affects Versions: 2.1.0-beta Reporter: Wangda Tan Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, YARN-1197_Design.pdf The current YARN resource management logic assumes resource allocated to a container is fixed during the lifetime of it. When users want to change a resource of an allocated container the only way is releasing it and allocating a new container with expected size. Allowing run-time changing resources of an allocated container will give us better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs
[ https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569648#comment-14569648 ] Siddharth Seth commented on YARN-1462: -- ApplicationReport.newInstance is used by mapreduce and Tez, and potentially other applications which may be modeled along the same AMs. It'll be useful to make the API change here compatible. This is along the lines of newInstances being used for various constructs like ContainerId, AppId, etc. With the change, I don't believe MR2.6 will work with a 2.8 cluster - depending on how the classpath is setup. AHS API and other AHS changes to handle tags for completed MR jobs -- Key: YARN-1462 URL: https://issues.apache.org/jira/browse/YARN-1462 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Xuan Gong Fix For: 2.8.0 Attachments: YARN-1462-branch-2.7-1.2.patch, YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, YARN-1462.3.patch AHS related work for tags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3674) YARN application disappears from view
[ https://issues.apache.org/jira/browse/YARN-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549858#comment-14549858 ] Siddharth Seth commented on YARN-3674: -- Clicking on a specific queue on the scheduler page, followed by a click on the 'Applications' / 'RUNNING' / etc links - ends up on a page which show no information that a queue has been selected. Ends up looking like the cluster isn't RUNNING anything or hasn't run anything if the queue isn't used. For [~sershe] - this was worse. Going back and selecting the default queue made no difference to the apps listing. YARN application disappears from view - Key: YARN-3674 URL: https://issues.apache.org/jira/browse/YARN-3674 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Sergey Shelukhin I have 2 tabs open at exact same URL with RUNNING applications view. There is an application that is, in fact, running, that is visible in one tab but not the other. This persists across refreshes. If I open new tab from the tab where the application is not visible, in that tab it shows up ok. I didn't change scheduler/queue settings before this behavior happened; on [~sseth]'s advice I went and tried to click the root node of the scheduler on scheduler page; the app still does not become visible. Something got stuck somewhere... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-886) make APPLICATION_STOP consistent with APPLICATION_INIT
[ https://issues.apache.org/jira/browse/YARN-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524210#comment-14524210 ] Siddharth Seth commented on YARN-886: - [~djp] - this looks like it's still valid. START is sent to the service that the app specified. STOP is sent to all AuxServices. make APPLICATION_STOP consistent with APPLICATION_INIT -- Key: YARN-886 URL: https://issues.apache.org/jira/browse/YARN-886 Project: Hadoop YARN Issue Type: Bug Components: applications, nodemanager Affects Versions: 2.0.4-alpha Reporter: Avner BenHanoch Currently, there is inconsistency between the start/stop behaviour. See Siddharth's comment in MAPREDUCE-5329: The start/stop behaviour should be consistent. We shouldn't send the stop to all service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1503) Support making additional 'LocalResources' available to running containers
[ https://issues.apache.org/jira/browse/YARN-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1503: - Assignee: (was: Siddharth Seth) Support making additional 'LocalResources' available to running containers -- Key: YARN-1503 URL: https://issues.apache.org/jira/browse/YARN-1503 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth We have a use case, where additional resources (jars, libraries etc) need to be made available to an already running container. Ideally, we'd like this to be done via YARN (instead of having potentially multiple containers per node download resources on their own). Proposal: NM to support an additional API where a list of resources can be specified. Something like localiceResource(ContainerId, MapString, LocalResource) NM would also require an additional API to get state for these resources - getLocalizationState(ContainerId) - which returns the current state of all local resources for the specified container(s). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-575) ContainerManager APIs should be user accessible
[ https://issues.apache.org/jira/browse/YARN-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved YARN-575. - Resolution: Won't Fix Assignee: (was: Vinod Kumar Vavilapalli) Closing as Won't Fix based on the comments. ContainerManager APIs should be user accessible --- Key: YARN-575 URL: https://issues.apache.org/jira/browse/YARN-575 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Priority: Critical Auth for ContainerManager is based on the containerId being accessed - since this is what is used to launch containers (There's likely another jira somewhere to change this to not be containerId based). What this also means is the API is effectively not usable with kerberos credentials. Also, it should be possible to use this API with some generic tokens (RMDelegation?), instead of with Container specific tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-670) Add an Exception to indicate 'Maintenance' for NMs and add this to the JavaDoc for appropriate protocols
[ https://issues.apache.org/jira/browse/YARN-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-670: Assignee: (was: Siddharth Seth) Add an Exception to indicate 'Maintenance' for NMs and add this to the JavaDoc for appropriate protocols Key: YARN-670 URL: https://issues.apache.org/jira/browse/YARN-670 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3310) MR-279: Log info about the location of dist cache
[ https://issues.apache.org/jira/browse/YARN-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-3310: - Assignee: (was: Siddharth Seth) MR-279: Log info about the location of dist cache - Key: YARN-3310 URL: https://issues.apache.org/jira/browse/YARN-3310 Project: Hadoop YARN Issue Type: Improvement Reporter: Ramya Sunil Priority: Minor Currently, there is no log info available about the actual location of the file/archive in dist cache being used by the task except for the ln command in task.sh. We need to log this information to help in debugging esp in those cases where there are more than one archive with the same name. In 0.20.x, in task logs, one could find log info such as the following: INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: distcache location/archive - mapred.local.dir/archive -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-197) Add a separate log server
[ https://issues.apache.org/jira/browse/YARN-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved YARN-197. - Resolution: Won't Fix Resolving due to the presence of additional services which can be used for serving logs. Add a separate log server - Key: YARN-197 URL: https://issues.apache.org/jira/browse/YARN-197 Project: Hadoop YARN Issue Type: New Feature Reporter: Siddharth Seth Currently, the job history server is being used for log serving. A separate log server can be added which can deal with serving logs, along with other functionality like log retention, merging, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-197) Add a separate log server
[ https://issues.apache.org/jira/browse/YARN-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481525#comment-14481525 ] Siddharth Seth commented on YARN-197: - Yes, as long as the logs are being served out by a sub-system other than the MapReduce history server. Add a separate log server - Key: YARN-197 URL: https://issues.apache.org/jira/browse/YARN-197 Project: Hadoop YARN Issue Type: New Feature Reporter: Siddharth Seth Currently, the job history server is being used for log serving. A separate log server can be added which can deal with serving logs, along with other functionality like log retention, merging, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-671) Add an interface on the RM to move NMs into a maintenance state
[ https://issues.apache.org/jira/browse/YARN-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-671: Assignee: (was: Siddharth Seth) Add an interface on the RM to move NMs into a maintenance state --- Key: YARN-671 URL: https://issues.apache.org/jira/browse/YARN-671 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-671) Add an interface on the RM to move NMs into a maintenance state
[ https://issues.apache.org/jira/browse/YARN-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312917#comment-14312917 ] Siddharth Seth commented on YARN-671: - The intent was to have an interface to decommission a cluster via the RM, instead of talking to NMs. I think that's going to be the case in YARN-914 - so yep, this can be closed. Add an interface on the RM to move NMs into a maintenance state --- Key: YARN-671 URL: https://issues.apache.org/jira/browse/YARN-671 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality
[ https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304189#comment-14304189 ] Siddharth Seth commented on YARN-1723: -- +1. The patch looks good. Will commit after jenkins comes back. AMRMClientAsync missing blacklist addition and removal functionality Key: YARN-1723 URL: https://issues.apache.org/jira/browse/YARN-1723 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Bikas Saha Assignee: Bartosz Ługowski Fix For: 2.7.0 Attachments: YARN-1723.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality
[ https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1723: - Assignee: Bartosz Ługowski AMRMClientAsync missing blacklist addition and removal functionality Key: YARN-1723 URL: https://issues.apache.org/jira/browse/YARN-1723 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Bikas Saha Assignee: Bartosz Ługowski Fix For: 2.7.0 Attachments: YARN-1723.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode
[ https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202758#comment-14202758 ] Siddharth Seth commented on YARN-2830: -- +1 for retaining the old newInstance method. One concern about the patch though - ContainerId will end up with two very similar methods. newContainerId(AppAttemptId, int) | Deprecated newContainerId(AppAttemptId, long) It's very easy to get these incorrect within YARN itself - which can introduce some tough to debug issues. Instead, I think it'll be a lot safer to rename the new method - and retaining the old one with the old one for compatibility. Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode -- Key: YARN-2830 URL: https://issues.apache.org/jira/browse/YARN-2830 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Blocker Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch YARN-2229 modified the private unstable api for constructing. Tez uses this api (shouldn't, but does) for use with Tez Local Mode. This causes a NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose we add the backwards compatible api since overflow is not a problem in tez local mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of AdminService
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191562#comment-14191562 ] Siddharth Seth commented on YARN-2698: -- FWIW, this can break downstream components which may have unit tests making use of NodeReport. The API is annotated private, however it would be useful to have some kind of stable mocks for entities which are likely to be used for testing downstream projects. Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of AdminService --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Fix For: 2.6.0 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, YARN-2698-20141029-2.patch, YARN-2698-20141030-1.patch YARN AdminService should have write API only, for other read APIs, they should be located at RM ClientService. Include, 1) getClusterNodeLabels 2) getNodeToLabels 3) getNodeReport should contains labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of AdminService
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191569#comment-14191569 ] Siddharth Seth commented on YARN-2698: -- Actually, it'll help downstream projects if the old method is left in place, and deprecated - instead of removing it altogether. Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of AdminService --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Fix For: 2.6.0 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, YARN-2698-20141029-2.patch, YARN-2698-20141030-1.patch YARN AdminService should have write API only, for other read APIs, they should be located at RM ClientService. Include, 1) getClusterNodeLabels 2) getNodeToLabels 3) getNodeReport should contains labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2789) Re-instate the NodeReport.newInstance API modified in YARN-2968
Siddharth Seth created YARN-2789: Summary: Re-instate the NodeReport.newInstance API modified in YARN-2968 Key: YARN-2789 URL: https://issues.apache.org/jira/browse/YARN-2789 Project: Hadoop YARN Issue Type: Task Reporter: Siddharth Seth Priority: Critical Even though this is a private API, it will be used by downstream projects for testing. It'll be useful for this to be re-instated, maybe with a deprecated annotation, so that older versions of downstream projects can build against Hadoop 2.6. create() being private is a problem for multiple other classes - ContainerId, AppId etc, Container, NodeId ... Most classes on the client facing YARN APIs are likely to be required for testing in downstream projects. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of AdminService
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192203#comment-14192203 ] Siddharth Seth commented on YARN-2698: -- Created YARN-2789 Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of AdminService --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Fix For: 2.6.0 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, YARN-2698-20141029-2.patch, YARN-2698-20141030-1.patch YARN AdminService should have write API only, for other read APIs, they should be located at RM ClientService. Include, 1) getClusterNodeLabels 2) getNodeToLabels 3) getNodeReport should contains labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2464) Provide Hadoop as a local resource (on HDFS) which can be used by other projcets
Siddharth Seth created YARN-2464: Summary: Provide Hadoop as a local resource (on HDFS) which can be used by other projcets Key: YARN-2464 URL: https://issues.apache.org/jira/browse/YARN-2464 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth DEFAULT_YARN_APPLICATION_CLASSPATH are used by YARN projects to setup their AM / task classpaths if they have a dependency on Hadoop libraries. It'll be useful to provide similar access to a Hadoop tarball (Hadoop libs, native libraries) etc, which could be used instead - for applications which do not want to rely upon Hadoop versions from a cluster node. This would also require functionality to update the classpath/env for the apps based on the structure of the tar. As an example, MR has support for a full tar (for rolling upgrades). Similarly, Tez ships hadoop libraries along with it's build. I'm not sure about the Spark / Storm / HBase model for this - but using a common copy instead of everyone localizing Hadoop libraries would be useful. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072356#comment-14072356 ] Siddharth Seth commented on YARN-2229: -- [~ozawa] - I was primarily looking at this from a backward compatibility perspective. Will leave the decision to go with the current approach or adding a hidden field to you, Jian and Zhijie. ContainerId can overflow with RM restart Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2229.1.patch, YARN-2229.10.patch, YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2264) Race in DrainDispatcher can cause random test failures
Siddharth Seth created YARN-2264: Summary: Race in DrainDispatcher can cause random test failures Key: YARN-2264 URL: https://issues.apache.org/jira/browse/YARN-2264 Project: Hadoop YARN Issue Type: Bug Reporter: Siddharth Seth This is what can happen. This is the potential race. DrainDispatcher is started via serviceStart() . As a last step, this starts the actual dispatcher thread (eventHandlingThread.start() - and returns immediately - which means the thread may or may not have started up by the time start returns. Event sequence: UserThread: calls dispatcher.getEventHandler().handle() This sets drained = false, and a context switch happens. DispatcherThread: starts running DispatcherThread drained = queue.isEmpty(); - This sets drained to true, since Thread1 yielded before putting anything into the queue. UserThread: actual.handle(event) - which puts the event in the queue for the dispatcher thread to process, and returns control. UserThread: dispatcher.await() - Since drained is true, this returns immediately - even though there is a pending event to process. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-766) TestNodeManagerShutdown should use Shell to form the output path
[ https://issues.apache.org/jira/browse/YARN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974450#comment-13974450 ] Siddharth Seth commented on YARN-766: - [~djp], The 2.x patch is only required to fix a difference in formatting between trunk and branch-2. Up to you on whether to fix the trunk formatting in this jira or whenever the code is touched next. TestNodeManagerShutdown should use Shell to form the output path Key: YARN-766 URL: https://issues.apache.org/jira/browse/YARN-766 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Minor Attachments: YARN-766.branch-2.txt, YARN-766.trunk.txt, YARN-766.txt File scriptFile = new File(tmpDir, scriptFile.sh); should be replaced with File scriptFile = Shell.appendScriptExtension(tmpDir, scriptFile); to match trunk. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1892) Excessive logging in RM
Siddharth Seth created YARN-1892: Summary: Excessive logging in RM Key: YARN-1892 URL: https://issues.apache.org/jira/browse/YARN-1892 Project: Hadoop YARN Issue Type: Bug Reporter: Siddharth Seth Priority: Minor Mostly in the CS I believe {code} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: Application application_1395435468498_0011 reserved container container_1395435468498_0011_01_000213 on node host: #containers=5 available=4096 used=20960, currently has 1 at priority 4; currentReservation 4096 {code} {code} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: hive2 usedResources: memory:20480, vCores:5 clusterResources: memory:81920, vCores:16 currentCapacity 0.25 required memory:4096, vCores:1 potentialNewCapacity: 0.255 ( max-capacity: 0.25) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1664) Add a utility to retrieve the RM Principal (renewer for tokens)
Siddharth Seth created YARN-1664: Summary: Add a utility to retrieve the RM Principal (renewer for tokens) Key: YARN-1664 URL: https://issues.apache.org/jira/browse/YARN-1664 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth Currently the logic to retrieve the renewer to be used while retrieving HDFS tokens resides in MapReduce. This should ideally be a utility in YARN since it's likely to be required by other applications as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (YARN-1517) AMFilterInitializer with configurable AMIpFilter
[ https://issues.apache.org/jira/browse/YARN-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1517: - Assignee: Pramod Immaneni AMFilterInitializer with configurable AMIpFilter Key: YARN-1517 URL: https://issues.apache.org/jira/browse/YARN-1517 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Pramod Immaneni Assignee: Pramod Immaneni We need to implement custom logic in a filter for our webservice similar to org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter and it would be convenient if we extended AmIpFilter as the proxy locations already available. We would need to specify a filter initializer for this filter. The initializer would be same as AmFilterInitializer except that it would add our filter instead of AmIpFilter and it would be better if we could reuse AmlFilterInitializer. Can AmFilterInitializer be updated to specify a filter name and filter class. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-1503) Support making additional 'LocalResources' available to running containers
[ https://issues.apache.org/jira/browse/YARN-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848111#comment-13848111 ] Siddharth Seth commented on YARN-1503: -- bq. A slightly more detailed explanation of the use-case so everyone can understand? And why something like YARN-1040 is not enough. YARN-1040 talks about launching multiple processes within the same container. This requirement is for a single running process - we want to avoid re-launching the process due to the cost involved with starting a new Java process. The specific use case is running different tasks within the same JVM - where one task may need some additional jars (Hive UDFs for example). Support making additional 'LocalResources' available to running containers -- Key: YARN-1503 URL: https://issues.apache.org/jira/browse/YARN-1503 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth We have a use case, where additional resources (jars, libraries etc) need to be made available to an already running container. Ideally, we'd like this to be done via YARN (instead of having potentially multiple containers per node download resources on their own). Proposal: NM to support an additional API where a list of resources can be specified. Something like localiceResource(ContainerId, MapString, LocalResource) NM would also require an additional API to get state for these resources - getLocalizationState(ContainerId) - which returns the current state of all local resources for the specified container(s). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (YARN-1503) Support making additional 'LocalResources' available to running containers
Siddharth Seth created YARN-1503: Summary: Support making additional 'LocalResources' available to running containers Key: YARN-1503 URL: https://issues.apache.org/jira/browse/YARN-1503 Project: Hadoop YARN Issue Type: Improvement Reporter: Siddharth Seth Assignee: Siddharth Seth We have a use case, where additional resources (jars, libraries etc) need to be made available to an already running container. Ideally, we'd like this to be done via YARN (instead of having potentially multiple containers per node download resources on their own). Proposal: NM to support an additional API where a list of resources can be specified. Something like localiceResource(ContainerId, MapString, LocalResource) NM would also require an additional API to get state for these resources - getLocalizationState(ContainerId) - which returns the current state of all local resources for the specified container(s). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (YARN-1274) LCE fails to run containers that don't have resources to localize
[ https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1274: - Attachment: YARN-1274.1.txt Updated launch_container to create the app level local and log directories. Verified dir permissions on a secure cluster. LCE fails to run containers that don't have resources to localize - Key: YARN-1274 URL: https://issues.apache.org/jira/browse/YARN-1274 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Alejandro Abdelnur Assignee: Siddharth Seth Priority: Blocker Attachments: YARN-1274.1.txt LCE container launch assumes the usercache/USER directory exists and it is owned by the user running the container process. But the directory is created only if there are resources to localize by the LCE localization command, if there are not resourcdes to localize, LCE localization never executes and launching fails reporting 255 exit code and the NM logs have something like: {code} 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command provided 1 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is llama 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create directory llama in /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04 - Permission denied {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1274) LCE fails to run containers that don't have resources to localize
[ https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1274: - Attachment: YARN-1274.trunk.1.txt Patch for trunk and branch-2. The previous patch applies to branch-2.1. LCE fails to run containers that don't have resources to localize - Key: YARN-1274 URL: https://issues.apache.org/jira/browse/YARN-1274 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Alejandro Abdelnur Assignee: Siddharth Seth Priority: Blocker Attachments: YARN-1274.1.txt, YARN-1274.trunk.1.txt LCE container launch assumes the usercache/USER directory exists and it is owned by the user running the container process. But the directory is created only if there are resources to localize by the LCE localization command, if there are not resourcdes to localize, LCE localization never executes and launching fails reporting 255 exit code and the NM logs have something like: {code} 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command provided 1 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is llama 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create directory llama in /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04 - Permission denied {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1274) LCE fails to run containers that don't have resources to localize
[ https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786910#comment-13786910 ] Siddharth Seth commented on YARN-1274: -- I'm in favour of chanigng the LCE as well. it looks like the log dirs may need to be created with the correct permissions as well. LCE fails to run containers that don't have resources to localize - Key: YARN-1274 URL: https://issues.apache.org/jira/browse/YARN-1274 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Blocker LCE container launch assumes the usercache/USER directory exists and it is owned by the user running the container process. But the directory is created only if there are resources to localize by the LCE localization command, if there are not resourcdes to localize, LCE localization never executes and launching fails reporting 255 exit code and the NM logs have something like: {code} 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command provided 1 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is llama 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create directory llama in /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04 - Permission denied {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Assigned] (YARN-1274) LCE fails to run containers that don't have resources to localize
[ https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth reassigned YARN-1274: Assignee: Siddharth Seth (was: Alejandro Abdelnur) LCE fails to run containers that don't have resources to localize - Key: YARN-1274 URL: https://issues.apache.org/jira/browse/YARN-1274 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Alejandro Abdelnur Assignee: Siddharth Seth Priority: Blocker LCE container launch assumes the usercache/USER directory exists and it is owned by the user running the container process. But the directory is created only if there are resources to localize by the LCE localization command, if there are not resourcdes to localize, LCE localization never executes and launching fails reporting 255 exit code and the NM logs have something like: {code} 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command provided 1 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is llama 2013-10-04 14:07:56,425 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create directory llama in /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04 - Permission denied {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784850#comment-13784850 ] Siddharth Seth commented on YARN-1131: -- Will open the followup jiras. Running this through jenkins again. Haven't seen the specific test fail or timeout on my local runs. $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt, YARN-1131.2.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading
[ https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784864#comment-13784864 ] Siddharth Seth commented on YARN-890: - +1. Resources should not be rounded up. Is there a similar round up in the actual allocation code, which may cause additional containers to be allocated to a queue ?. Should the CS be allowing nodes to register if the nm-memory.mb is not a multiple of minimum-allocation-mb, or should it just be rounding down at registration ? The roundup for memory values on resource manager UI is misleading -- Key: YARN-890 URL: https://issues.apache.org/jira/browse/YARN-890 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Trupti Dhavle Assignee: Xuan Gong Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, YARN-890.1.patch, YARN-890.2.patch From the yarn-site.xml, I see following values- property nameyarn.nodemanager.resource.memory-mb/name value4192/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4192/value /property property nameyarn.scheduler.minimum-allocation-mb/name value1024/value /property However the resourcemanager UI shows total memory as 5MB -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1131: - Attachment: YARN-1131.3.txt Updated the patch to get the tests working, also added one more test for when an app is not known by the RM. $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785718#comment-13785718 ] Siddharth Seth commented on YARN-1131: -- If another state does get added to the YarnApplicationState - we don't know if this is a final state or not. I'd prefer falling back to trying to find the logs on disk, which is what happens rightnow. $yarn logs command should return an appropriate error message if YARN application is still running -- Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784613#comment-13784613 ] Siddharth Seth commented on YARN-1131: -- [~djp], if you don't mind, I'd like to take this over - would be good to get it into the next release. $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Tassapol Athiapinya Assignee: Junping Du Priority: Minor Fix For: 2.1.2-beta In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1131: - Attachment: YARN-1131.1.txt Changes in the patch Adds a YARN application status check based on the ApplicationId, to log a correct message if the application is running. If an application is not found in the RM - the CLI tool will continue to search for the files on hdfs (RM not running, or RM restarted). Fixes the exception in case of an invalid applicationId. There's still a case, right after an app completes, but before aggregation is complete where an empty output is returned. That should be a separate jira though. $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Fix For: 2.1.2-beta Attachments: YARN-1131.1.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running
[ https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784841#comment-13784841 ] Siddharth Seth commented on YARN-1131: -- Thanks for the review. bq. why not use Option.setRequired for the applicationId param - this will allow removal of the appIdStr == null check. Will look into using this. bq. is a YarnApplicationState check enough to guarantee that the user receives the correct error message in case logs are tried to be retrieved when log aggregration is still in process just after the app completes? Had mentioned this in my last comment. Not targeting for this jira. bq. There's still a case, right after an app completes, but before aggregation is complete where an empty output is returned. That should be a separate jira though. bq. typo in function name dumpAContainersLogs or is it meant to read dump a container's logs? Maybe just dumpContainerLogs? I believe it was meant to be this. The diff, unfortunately, is a lot bigger than it should be, since the files had to be moved between packages. bq. containerIdStr and nodeAddressStr could be parsed for correct format to error out earlier before invoking the actual log reader functionality. bq. missing test for when container id specified but node address is not ( and vice versa ) ? Only targeting the specific issue mentioned in the jira. I'm sure there's more - but applicationId is likely to be the most common case. The rest can be a single or multiple separate jiras. $ yarn logs should return a message log aggregation is during progress if YARN application is running - Key: YARN-1131 URL: https://issues.apache.org/jira/browse/YARN-1131 Project: Hadoop YARN Issue Type: Sub-task Components: client Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Siddharth Seth Priority: Minor Attachments: YARN-1131.1.txt In the case when log aggregation is enabled, if a user submits MapReduce job and runs $ yarn logs -applicationId app ID while the YARN application is running, the command will return no message and return user back to shell. It is nice to tell the user that log aggregation is in progress. {code} -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002 -bash-4.1$ {code} At the same time, if invalid application ID is given, YARN CLI should say that the application ID is incorrect rather than throwing NoSuchElementException. {code} $ /usr/bin/yarn logs -applicationId application_0 Exception in thread main java.util.NoSuchElementException at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119) at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110) at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776072#comment-13776072 ] Siddharth Seth commented on YARN-1229: -- I'm in favour of renaming the shuffle service id as well, and enforcing constraints on the names. Shell parameters apparently have name restrictions - http://stackoverflow.com/questions/2821043/allowed-characters-in-linux-environment-variable-names has some links to standards. Setting aux-service name restrictions based on shell name restrictions seems ok to me. This is an incompatible change though. Sites which have Hadoop 2 (or 0.23) deployed would need to change their configs to reflect the shuffle service name update. (The shuffleService isn't started when using the default hadoop configuration files). An alternate could be to use base32 encoding for the service name - but would prefer not going there. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.1-beta I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776865#comment-13776865 ] Siddharth Seth commented on YARN-1229: -- Just looked at the patch, it'd be nice to include underscores as well - provides for a separator in the allowed character set. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776915#comment-13776915 ] Siddharth Seth commented on YARN-1229: -- Took a quick look. - Can you please rename MapreduceShuffle to mapreduce_shuffle (closer to the old name) - The check can be regex based, rather than walking through all the characters. - Include an empty check along with the null check Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776945#comment-13776945 ] Siddharth Seth commented on YARN-1229: -- Patch looks good. Missed this earlier, but there's several references to mapreduce.shuffle in documentation which need to be updated. Also, since it's being updated - can you make the Pattern final. Thanks Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch, YARN-1229.5.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start
[ https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776979#comment-13776979 ] Siddharth Seth commented on YARN-1229: -- +1. Committing. Shell$ExitCodeException could happen if AM fails to start - Key: YARN-1229 URL: https://issues.apache.org/jira/browse/YARN-1229 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.1.1-beta Reporter: Tassapol Athiapinya Assignee: Xuan Gong Priority: Blocker Fix For: 2.1.2-beta Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch I run sleep job. If AM fails to start, this exception could occur: 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with state FAILED due to: Application application_1379673267098_0020 failed 1 times due to AM Container for appattempt_1379673267098_0020_01 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh: line 12: export: `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA= ': not a valid identifier at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) .Failing this attempt.. Failing the application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-886) make APPLICATION_STOP consistent with APPLICATION_INIT
[ https://issues.apache.org/jira/browse/YARN-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753074#comment-13753074 ] Siddharth Seth commented on YARN-886: - Essentially, APPLICATION_INIT should only be sent to Auxiliary services specified by the user in the startContainer requests. Similarly APPLICATION_STOP should only be sent to Auxiliary services specified by the user during the startContainer call. make APPLICATION_STOP consistent with APPLICATION_INIT -- Key: YARN-886 URL: https://issues.apache.org/jira/browse/YARN-886 Project: Hadoop YARN Issue Type: Bug Components: applications, nodemanager Affects Versions: 2.0.4-alpha Reporter: Avner BenHanoch Currently, there is inconsistency between the start/stop behaviour. See Siddharth's comment in MAPREDUCE-5329: The start/stop behaviour should be consistent. We shouldn't send the stop to all service. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1045) Improve toString implementation for PBImpls
[ https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1045: - Hadoop Flags: Reviewed Improve toString implementation for PBImpls --- Key: YARN-1045 URL: https://issues.apache.org/jira/browse/YARN-1045 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Jian He Fix For: 2.1.1-beta Attachments: YARN-1045.1.patch, YARN-1045.patch The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+, );{code} is rather inefficient - replacing \n and \s to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}. If we can get this into 2.1.0 - great, otherwise the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-1045) Improve toString implementation for PBImpls
[ https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-1045: - Fix Version/s: (was: 2.1.1-beta) 2.1.0-beta Committed to branch-2.1.0-beta. Improve toString implementation for PBImpls --- Key: YARN-1045 URL: https://issues.apache.org/jira/browse/YARN-1045 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Jian He Fix For: 2.1.0-beta Attachments: YARN-1045.1.patch, YARN-1045.patch The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+, );{code} is rather inefficient - replacing \n and \s to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}. If we can get this into 2.1.0 - great, otherwise the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1067) AMRMClient heartbeat interval should not be static
Siddharth Seth created YARN-1067: Summary: AMRMClient heartbeat interval should not be static Key: YARN-1067 URL: https://issues.apache.org/jira/browse/YARN-1067 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Siddharth Seth The heartbeat interval can be modified dynamically - more often when there are pending requests, and toned down when the heartbeat is solving no purpose other than a ping. There's a couple of jiras which are trying to change the scheduling loop - at which point this becomes useful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-1045) Improve toString implementation for PBImpls
[ https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733934#comment-13733934 ] Siddharth Seth commented on YARN-1045: -- Thanks for taking this up Jian. Did you get a chance to run all MR and YARN unit tests locally - in case we're relying on the toString format anywhere. Improve toString implementation for PBImpls --- Key: YARN-1045 URL: https://issues.apache.org/jira/browse/YARN-1045 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Assignee: Jian He Attachments: YARN-1045.patch The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+, );{code} is rather inefficient - replacing \n and \s to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}. If we can get this into 2.1.0 - great, otherwise the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-899) Get queue administration ACLs working
[ https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734040#comment-13734040 ] Siddharth Seth commented on YARN-899: - bq. With this in mind, I think who has access should be based on a union of ACLs Agree. AMs get ACLs from the RM when they register. That could be a combined list along with the queue ACLs. It's up to the AMs to enforce these. Maybe the RM proxy could do some of this as well. The MR JobHistoryServer gets ACLs from the AM - again it's up to this to enforce them. The RM AppHistoryServer will need to do the union though. Don't have experience with JT ACLs, but it does look like that's doing a union as well. View vs Modify ACLs for queues makes sense to me. Get queue administration ACLs working - Key: YARN-899 URL: https://issues.apache.org/jira/browse/YARN-899 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Xuan Gong Attachments: YARN-899.1.patch The Capacity Scheduler documents the yarn.scheduler.capacity.root.queue-path.acl_administer_queue config option for controlling who can administer a queue, but it is not hooked up to anything. The Fair Scheduler could make use of a similar option as well. This is a feature-parity regression from MR1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-1045) Improve toString implementation for PBImpls
Siddharth Seth created YARN-1045: Summary: Improve toString implementation for PBImpls Key: YARN-1045 URL: https://issues.apache.org/jira/browse/YARN-1045 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth The generic toString implementation that is used in most of the PBImpls {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+, );{code} is rather inefficient - replacing \n and \s to generate a one line string. Instead, we can use {code}TextFormat.shortDebugString(getProto());{code}. If we can get this into 2.1.0 - great, otherwise the next release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-855) YarnClient.init should ensure that yarn parameters are present
[ https://issues.apache.org/jira/browse/YARN-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725969#comment-13725969 ] Siddharth Seth commented on YARN-855: - The simplest would be to check the configuration type - which keeps the API stable. The reason I mentioned parameters is that apps that use YarnClient may have their own configuration type - e.g. JobConf or a HiveConf. Type information ends up getting lost even if these apps have created their configurations based on a YarnConfiguration. YarnClient.init should ensure that yarn parameters are present -- Key: YARN-855 URL: https://issues.apache.org/jira/browse/YARN-855 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Siddharth Seth Assignee: Abhishek Kapoor It currently accepts a Configuration object in init and doesn't check whether it contains yarn parameters or is a YarnConfiguration. Should either accept YarnConfiguration, check existence of parameters or create a YarnConfiguration based on the configuration passed to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-896) Roll up for long lived YARN
[ https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723968#comment-13723968 ] Siddharth Seth commented on YARN-896: - bq. Robert Joseph Evans Applications may connect to other services such as HBase or issue tokens for communication between its own containers. All of these would require renewal. The RM takes care of renewing tokens for HDFS - it can do this since the HDFS token renewer class is in the RM's classpath. For other applications - Hive for example - this isn't possible. I believe Hive ends up issuing tokens which are valid for a longer duration to get around the renewal problem. I won't necessarily link this to long running YARN though - other than the bit about the token max-age. Roll up for long lived YARN --- Key: YARN-896 URL: https://issues.apache.org/jira/browse/YARN-896 Project: Hadoop YARN Issue Type: New Feature Reporter: Robert Joseph Evans YARN is intended to be general purpose, but it is missing some features to be able to truly support long lived applications and long lived containers. This ticket is intended to # discuss what is needed to support long lived processes # track the resulting JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-710) Add to ser/deser methods to RecordFactory
[ https://issues.apache.org/jira/browse/YARN-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698232#comment-13698232 ] Siddharth Seth commented on YARN-710: - In the unit test, the setters on the ApplicationId aren't meant to be used (will end up throwing exceptions - this is replaced by newInstance in AppliactionId). Don't think getProto() needs to be changed at all in RecordFactoryPBImpl - instead a new getBuilder method should be sufficient. Somewhere along the flow, it looks like the default proto ends up being created - possibly linked to the getProto changes. Add to ser/deser methods to RecordFactory - Key: YARN-710 URL: https://issues.apache.org/jira/browse/YARN-710 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.0.4-alpha Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: YARN-710.patch, YARN-710.patch, YARN-710-wip.patch I order to do things like AMs failover and checkpointing I need to serialize app IDs, app attempt IDs, containers and/or IDs, resource requests, etc. Because we are wrapping/hiding the PB implementation from the APIs, we are hiding the built in PB ser/deser capabilities. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-855) YarnClient.init should ensure that yarn parameters are present
Siddharth Seth created YARN-855: --- Summary: YarnClient.init should ensure that yarn parameters are present Key: YARN-855 URL: https://issues.apache.org/jira/browse/YARN-855 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Siddharth Seth It currently accepts a Configuration object in init and doesn't check whether it contains yarn parameters or is a YarnConfiguration. Should either accept YarnConfiguration, check existence of parameters or create a YarnConfiguration based on the configuration passed to it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-848) Nodemanager does not register with RM using the fully qualified hostname
[ https://issues.apache.org/jira/browse/YARN-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687248#comment-13687248 ] Siddharth Seth commented on YARN-848: - +1. Simple enough patch. Hitesh, could you please rebase this on top of YARN-694 - which should go in soon. Nodemanager does not register with RM using the fully qualified hostname Key: YARN-848 URL: https://issues.apache.org/jira/browse/YARN-848 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: YARN-848.1.patch If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only foo. This can create problems if DNS cannot resolve the hostname properly. Furthermore, HDFS uses fully qualified hostnames which can end up affecting locality matches when allocating containers based on block locations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-848) Nodemanager does not register with RM using the fully qualified hostname
[ https://issues.apache.org/jira/browse/YARN-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687388#comment-13687388 ] Siddharth Seth commented on YARN-848: - +1. Thanks Hitesh. Committing this. Nodemanager does not register with RM using the fully qualified hostname Key: YARN-848 URL: https://issues.apache.org/jira/browse/YARN-848 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Assignee: Hitesh Shah Attachments: YARN-848.1.patch, YARN-848.3.patch If the hostname is misconfigured to not be fully qualified ( i.e. hostname returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering with the RM using only foo. This can create problems if DNS cannot resolve the hostname properly. Furthermore, HDFS uses fully qualified hostnames which can end up affecting locality matches when allocating containers based on block locations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-805) Fix yarn-api javadoc annotations
[ https://issues.apache.org/jira/browse/YARN-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685106#comment-13685106 ] Siddharth Seth commented on YARN-805: - Would be good if others take a look at the patch as well, since it documents long term API support. - Do the allocate APIs need to be marked as Evolving, should be looked at by someone who's been tracking YARN-397 closely. - Is this jira also supposed to add comments to the .proto files ?, or will that be a separate jira - Protocols vs Client libraries as the public interface. - Some methods are annotated private, unstable - others just private, is there a reason for this ? - Are the QueueInfo APIs stable ? - GetAllApplicationRequest is being changed in YARN-727 - so the annotations may need to change depending on what happens on when that gets committed. - The methods related to renew / cancellation of delegation tokens should continue to stay Private. The RM controls this for now. - Annotations missing on StartContainerRequestPBImpl - AMCommand - should this be marked as Evolving since additional commands may be added in the future. Similarly for NodeState. This likely needs to make it to the rolling upgrades jira as well - handling enumerations returned by method calls. - ApplicationAttemptId, ApplicationId - appAttemptIdStrPrefix, appIdPrefix should be marked Private - ApplicationReport.getCurrentApplicationAttemptId - should this be stable ? - ApplicationReport.getOriginalTrackingUrl - private ? meant for proxy use only. - ApplicationResourceUsageReport.getNumReservedContainers - not sure what numReservedContainers means in the context of multiple resources. Should probably be removed or marked private. Similarly NodeReport.getNumContainers - ApplicationSubmissionContext.setPriority - don't think this is used by any scheduler yet. Should it be private for now ? - ApplcationnSubmissionContext.setCancelTokensWhenComplete - evolving ? - ApplicationSubmissionContext.setResource needs javadoc - ContainerStatus.getExitStatus seems a little ambiguous. Evolving ? - NodeId, AppId, AppAttemptId don't need annotations on their protected setters - ResourceRequest/ResourceBlacklistRequest - update javadoc to say resource name instead of resource. - YarnRuntimeException - I believe this is meant for internal exceptions within YARN. private/LimitedPrivate(mapreduce) since this leaks all over MR code. Other - Should RegisterApplicationMasterRequest have a hostname only newInstance(). All apps won't necessarily have an rpc port and tracking url. Fix yarn-api javadoc annotations Key: YARN-805 URL: https://issues.apache.org/jira/browse/YARN-805 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Priority: Blocker Attachments: YARN-805.1.patch, YARN-805.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
[ https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685110#comment-13685110 ] Siddharth Seth commented on YARN-802: - bq. If I understand correctly, in order to simultaneously support multiple jobs of multiple users that each can contact different Shuffle provider we must have all providers in the air in parallel. Multiple providers can be run by the NodeManager in parallel. An application chooses which provider(s) it wants to use when it starts a container on a NodeManager. bq. This data for this map arrive from the APPLICATION_INIT event. Hence, all AuxServices that serve as ShuffleProviders need to get APPLICATION_INIT events. The data in the APPLICAITON_INIT event is from the startContainer request (the serviceData in the ContainerLaunchConetxt). If the application wants the INIT event to go to multiple providers, it can set the service data accordingly. The MapReduce AM hardcodes this to the default SHUFFLE_PROVIDER which is why only that one gets the init event. There may be auxillary services which are not responsible for shuffle, or are in general incompatible with the shuffle consumer configured by the job. I don't think they need to get an INIT event. APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler --- Key: YARN-802 URL: https://issues.apache.org/jira/browse/YARN-802 Project: Hadoop YARN Issue Type: Bug Components: applications, nodemanager Affects Versions: 2.0.4-alpha Reporter: Avner BenHanoch APPLICATION_INIT is never sent to AuxServices other than the built-in ShuffleHandler. This means that 3rd party ShuffleProvider(s) will not be able to function, because APPLICATION_INIT enables the AuxiliaryService to map jobId-userId. This is needed for properly finding the MOFs of a job per reducers' requests. NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to hard-coded expression in hadoop code. The current TaskAttemptImpl.java code explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, ...) and ignores any additional AuxiliaryService. As a result, only the built-in ShuffleHandler will get APPLICATION_INIT events. Any 3rd party AuxillaryService will never get APPLICATION_INIT events. I think a solution can be in one of two ways: 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register each of them, by calling serviceData.put (…) in loop. 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668 APPLICATION_STOP is never sent to AuxServices. This means that in case the 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux Services regardless of the value in event.getServiceID(). I prefer the 2nd solution. I am welcoming any ideas. I can provide the needed patch for any option that people like. See [Pluggable Shuffle in Hadoop documentation|http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-841) Annotate and document AuxService APIs
Siddharth Seth created YARN-841: --- Summary: Annotate and document AuxService APIs Key: YARN-841 URL: https://issues.apache.org/jira/browse/YARN-841 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Siddharth Seth For users writing their own AuxServices, these APIs should be annotated and need better documentation. Also, the classes may need to move out of the NodeManager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-844) Move PBImpls from yarn-api to yarn-common
Siddharth Seth created YARN-844: --- Summary: Move PBImpls from yarn-api to yarn-common Key: YARN-844 URL: https://issues.apache.org/jira/browse/YARN-844 Project: Hadoop YARN Issue Type: Bug Reporter: Siddharth Seth -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686102#comment-13686102 ] Siddharth Seth commented on YARN-666: - TBD - handling of Enum fields like AMCommand, NodeAction. This may be possible by forcing defaults if a new value needs to be added, alternately define a new Enum which is used by newer clients. [Umbrella] Support rolling upgrades in YARN --- Key: YARN-666 URL: https://issues.apache.org/jira/browse/YARN-666 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.0.4-alpha Reporter: Siddharth Seth Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf Jira to track changes required in YARN to allow rolling upgrades, including documentation and possible upgrade routes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-171) NodeManager should serve logs directly if log-aggregation is not enabled
[ https://issues.apache.org/jira/browse/YARN-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated YARN-171: Attachment: YARN-171_3.txt Uploading a newer, but still very dated version of the patch - which I had sitting around on my system. In case someone is taking over this jira - this could be used as a starting point, or not. NodeManager should serve logs directly if log-aggregation is not enabled Key: YARN-171 URL: https://issues.apache.org/jira/browse/YARN-171 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 0.23.3 Reporter: Vinod Kumar Vavilapalli Assignee: Siddharth Seth Attachments: YARN-171_3.txt, YARN171_WIP.txt NodeManagers never serve logs for completed applications. If log-aggregation is not enabled, in the interim, due to bugs like YARN-162, this is a serious problem for users as logs are necessarily not available. We should let nodes serve logs directly if YarnConfiguration.LOG_AGGREGATION_ENABLED is set. This should be okay as NonAggregatingLogHandler can retain logs upto YarnConfiguration.NM_LOG_RETAIN_SECONDS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-805) Fix yarn-api javadoc annotations
[ https://issues.apache.org/jira/browse/YARN-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686284#comment-13686284 ] Siddharth Seth commented on YARN-805: - Thanks for the updated patch Jian. Couple more comments. - The latest patch changes *ProtocolPB to Public. These should remain private - implementation detail. - ApplicationSubmissionContext.setResource still needs javadoc. The patch added it to getResource. - RegisterApplicationMaster.newInstance - looking at this again, I think it's better to add another newInstance method. The current documentation says set host to an empty string - that's not really correct since the RM won't set it either. Also, instead of empty string - the defaults should be null. If we add a new newInstance method - we can control the default values. Follow up jira - YARN should figure out the hostname instead of expecting it in the Register call (may not be possible for unmanaged AM). Otherwise, the changes look good. Fix yarn-api javadoc annotations Key: YARN-805 URL: https://issues.apache.org/jira/browse/YARN-805 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Priority: Blocker Attachments: YARN-805.1.patch, YARN-805.2.patch, YARN-805.3.patch, YARN-805.4.patch, YARN-805.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-841) Annotate and document AuxService APIs
[ https://issues.apache.org/jira/browse/YARN-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686311#comment-13686311 ] Siddharth Seth commented on YARN-841: - Looks good. Needs a small change to the javadoc though - initApplication is only sent to the AuxService specified by the application. We should probably do the same for the stopApplication as well. Annotate and document AuxService APIs - Key: YARN-841 URL: https://issues.apache.org/jira/browse/YARN-841 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.0.5-alpha Reporter: Siddharth Seth Assignee: Vinod Kumar Vavilapalli Attachments: YARN-841-20130617.1.txt, YARN-841-20130617.2.txt, YARN-841-20130617.txt For users writing their own AuxServices, these APIs should be annotated and need better documentation. Also, the classes may need to move out of the NodeManager. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-805) Fix yarn-api javadoc annotations
[ https://issues.apache.org/jira/browse/YARN-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686335#comment-13686335 ] Siddharth Seth commented on YARN-805: - +1. This looks good. Thanks Jian. Fix yarn-api javadoc annotations Key: YARN-805 URL: https://issues.apache.org/jira/browse/YARN-805 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Priority: Blocker Attachments: YARN-805.1.patch, YARN-805.2.patch, YARN-805.3.patch, YARN-805.4.patch, YARN-805.5.patch, YARN-805.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (YARN-844) Move PBImpls from yarn-api to yarn-common
[ https://issues.apache.org/jira/browse/YARN-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth resolved YARN-844. - Resolution: Duplicate Move PBImpls from yarn-api to yarn-common - Key: YARN-844 URL: https://issues.apache.org/jira/browse/YARN-844 Project: Hadoop YARN Issue Type: Bug Reporter: Siddharth Seth -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-825) Fix yarn-common javadoc annotations
[ https://issues.apache.org/jira/browse/YARN-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684883#comment-13684883 ] Siddharth Seth commented on YARN-825: - bq. Overall, I felt that it isn't enough to just mark the package as private where applicable, so I went ahead and added annotations for individual classes. Agree. A lot of the public classes need Javadoc. I think a follow-up jira can be used for this, which shouldn't block 2.1.0 (assuming 2.1.1 will be soon after). Also, there's a bunch of non-annotated classes in yarn-api as well - YarnException, YarnRuntimeException, YarnConfiguration, RecordFactory* being some of the important ones. Separate jira for this as well I think. Unrelated, should the PBImpls be moved from yarn-api to yarn-common (They're private anyway). Some stuff which may need to be changed - AggregatedLogsDeletionService - Private to LimitedPrivate. Used in the MR history server since a Yarn log/app history server does not exist. I don't mind leaving this as Private as well though - since it's use in MR is temporary. - Should ClientToAMTokenSecretManager be final, or do you think there's use cases where users may want to extend this. - Should ServiceStateModel be private - ApplicaionClassLoader - leave as Unstable ? - Until Apps, ConverterUtils etc are cleaned up - mark them as private ? Apps.addToEnvironment should be public though. - ResourceCalculatorPlugin and related classes - public Unstable or LimitedPrivate. This is already used in MapReduce - Similarly for RackResolver - Unrelated, should ApplcaitionTokenIdentifer be renamed to something like AMTokenIdentifier ? Fix yarn-common javadoc annotations --- Key: YARN-825 URL: https://issues.apache.org/jira/browse/YARN-825 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: YARN-825-20130615.1.txt, YARN-825-20130615.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
[ https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683579#comment-13683579 ] Siddharth Seth commented on YARN-802: - With YARN, a new AM (Application) is started per job. The initApp in the NM is per app - so each job/app can choose which shuffle provider it wants to use. The shuffle service configured for an AM will be specific to a single job only. From MAPREDUCE-4049 bq. A shuffle consumer instance will only contact one of the shuffle providers and will request its desired files only from from this provider. I'm assuming a single job will only use one shuffle provider - or do you see a situation where multiple shuffle providers can serve data to a single job ? In case of multiple jobs being run by a single AM - this gets more complicated, and we may need to initialize multiple providers. APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler --- Key: YARN-802 URL: https://issues.apache.org/jira/browse/YARN-802 Project: Hadoop YARN Issue Type: Bug Components: applications, nodemanager Affects Versions: 2.0.4-alpha Reporter: Avner BenHanoch APPLICATION_INIT is never sent to AuxServices other than the built-in ShuffleHandler. This means that 3rd party ShuffleProvider(s) will not be able to function, because APPLICATION_INIT enables the AuxiliaryService to map jobId-userId. This is needed for properly finding the MOFs of a job per reducers' requests. NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to hard-coded expression in hadoop code. The current TaskAttemptImpl.java code explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, ...) and ignores any additional AuxiliaryService. As a result, only the built-in ShuffleHandler will get APPLICATION_INIT events. Any 3rd party AuxillaryService will never get APPLICATION_INIT events. I think a solution can be in one of two ways: 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register each of them, by calling serviceData.put (…) in loop. 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668 APPLICATION_STOP is never sent to AuxServices. This means that in case the 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux Services regardless of the value in event.getServiceID(). I prefer the 2nd solution. I am welcoming any ideas. I can provide the needed patch for any option that people like. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-804) mark AbstractService init/start/stop methods as final
[ https://issues.apache.org/jira/browse/YARN-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683584#comment-13683584 ] Siddharth Seth commented on YARN-804: - Are the same semantics as AbstractService enforced for the CompositeService as well ? Are users expected to call super.init / super.serviceInit to take care of all the Services which are part of the composite service, or will CompositeService just take of this ? Otherwise, it may make sense to re-open YARN-811. mark AbstractService init/start/stop methods as final - Key: YARN-804 URL: https://issues.apache.org/jira/browse/YARN-804 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.1.0-beta Reporter: Steve Loughran Assignee: Vinod Kumar Vavilapalli Attachments: YARN-804-001.patch Now that YARN-117 and MAPREDUCE-5298 are checked in, we can mark the public AbstractService init/start/stop methods as final. Why? It puts the lifecycle check and error handling around the subclass code, ensuring no lifecycle method gets called in the wrong state or gets called more than once.When a {{serviceInit(), serviceStart() serviceStop()}} method throws an exception, it's caught and auto-triggers stop. Marking the methods as final forces service implementations to move to the stricter lifecycle. It has one side effect: some of the mocking tests play up -I'll need some assistance here -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
[ https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682488#comment-13682488 ] Siddharth Seth commented on YARN-802: - Can the MR AM specify the service to be used via configuration, and set the service data accordingly. APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler --- Key: YARN-802 URL: https://issues.apache.org/jira/browse/YARN-802 Project: Hadoop YARN Issue Type: Bug Components: applications, nodemanager Affects Versions: 2.0.4-alpha Reporter: Avner BenHanoch APPLICATION_INIT is never sent to AuxServices other than the built-in ShuffleHandler. This means that 3rd party ShuffleProvider(s) will not be able to function, because APPLICATION_INIT enables the AuxiliaryService to map jobId-userId. This is needed for properly finding the MOFs of a job per reducers' requests. NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to hard-coded expression in hadoop code. The current TaskAttemptImpl.java code explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, ...) and ignores any additional AuxiliaryService. As a result, only the built-in ShuffleHandler will get APPLICATION_INIT events. Any 3rd party AuxillaryService will never get APPLICATION_INIT events. I think a solution can be in one of two ways: 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register each of them, by calling serviceData.put (…) in loop. 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668 APPLICATION_STOP is never sent to AuxServices. This means that in case the 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux Services regardless of the value in event.getServiceID(). I prefer the 2nd solution. I am welcoming any ideas. I can provide the needed patch for any option that people like. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler
[ https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682724#comment-13682724 ] Siddharth Seth commented on YARN-802: - It should be possible to configure the MR AM with the shuffle service that needs to be used, in which case the MR AM sets up the service id correctly (in TaskAttemptImpl), and the NodeManager can send the init event to the correct service. We should probably change the stop to behave the same way. APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler --- Key: YARN-802 URL: https://issues.apache.org/jira/browse/YARN-802 Project: Hadoop YARN Issue Type: Bug Components: applications, nodemanager Affects Versions: 2.0.4-alpha Reporter: Avner BenHanoch APPLICATION_INIT is never sent to AuxServices other than the built-in ShuffleHandler. This means that 3rd party ShuffleProvider(s) will not be able to function, because APPLICATION_INIT enables the AuxiliaryService to map jobId-userId. This is needed for properly finding the MOFs of a job per reducers' requests. NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to hard-coded expression in hadoop code. The current TaskAttemptImpl.java code explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, ...) and ignores any additional AuxiliaryService. As a result, only the built-in ShuffleHandler will get APPLICATION_INIT events. Any 3rd party AuxillaryService will never get APPLICATION_INIT events. I think a solution can be in one of two ways: 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register each of them, by calling serviceData.put (…) in loop. 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668 APPLICATION_STOP is never sent to AuxServices. This means that in case the 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux Services regardless of the value in event.getServiceID(). I prefer the 2nd solution. I am welcoming any ideas. I can provide the needed patch for any option that people like. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (YARN-811) Add a set of final _init/_start/_stop methods to CompositeService
Siddharth Seth created YARN-811: --- Summary: Add a set of final _init/_start/_stop methods to CompositeService Key: YARN-811 URL: https://issues.apache.org/jira/browse/YARN-811 Project: Hadoop YARN Issue Type: Sub-task Reporter: Siddharth Seth Classes which implement AbstractService no longer need to make a super.init, start, stop call. The same could be done for CompositeService as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira