[jira] [Commented] (YARN-6509) Add a size threshold beyond which yarn logs will require a force option

2017-04-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981703#comment-15981703
 ] 

Siddharth Seth commented on YARN-6509:
--

Is the current proposal to change the default to fetch the last 4K ?
Can we please not make this change. It is definitely incompatible, and I'd 
argue that it's not very useful.

The intent of the jira is to protect users against log downloads which could 
otherwise take hours and fill up the local fs - apps which generate large logs.

> Add a size threshold beyond which yarn logs will require a force option
> ---
>
> Key: YARN-6509
> URL: https://issues.apache.org/jira/browse/YARN-6509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Xuan Gong
> Fix For: 2.9.0
>
> Attachments: YARN-6509.1.patch
>
>
> An accidental fetch for a long running application can lead to scenario which 
> the large size of log can fill up a disk.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3427) Remove deprecated methods from ResourceCalculatorProcessTree

2017-03-29 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947786#comment-15947786
 ] 

Siddharth Seth commented on YARN-3427:
--

>From a Tez perspective, would prefer if the methods were left in place. If 
>this was something that was fixed in 2.6, that would have been easier to work 
>with. Since the new methods were in 2.7 - Tez will need to introduce a shim 
>for this.

> Remove deprecated methods from ResourceCalculatorProcessTree
> 
>
> Key: YARN-3427
> URL: https://issues.apache.org/jira/browse/YARN-3427
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Blocker
> Attachments: YARN-3427.000.patch, YARN-3427.001.patch
>
>
> In 2.7, we made ResourceCalculatorProcessTree Public and exposed some 
> existing ill-formed methods as deprecated ones for use by Tez.
> We should remove it in 3.0.0, considering that the methods have been 
> deprecated for the all 2.x.y releases that it is marked Public in. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5738) Allow services to release/kill specific containers

2016-10-13 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-5738:


 Summary: Allow services to release/kill specific containers
 Key: YARN-5738
 URL: https://issues.apache.org/jira/browse/YARN-5738
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth


There are occasions on which specific containers may not be required by a 
service. Would be useful to have support to return these to YARN.
Slider flex doesn't give this control.

cc [~gsaha], [~vinodkv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5418) When partial log aggregation is enabled, display the list of aggregated files on the container log page

2016-07-21 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-5418:


 Summary: When partial log aggregation is enabled, display the list 
of aggregated files on the container log page
 Key: YARN-5418
 URL: https://issues.apache.org/jira/browse/YARN-5418
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth


The container log pages lists all files. However, as soon as a file gets 
aggregated - it's no longer available on this listing page.
It will be useful to list aggregated files as well as the current set of files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5297) Avoid printing a stack trace when recovering an app after the RM restarts

2016-06-27 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-5297:


 Summary: Avoid printing a stack trace when recovering an app after 
the RM restarts
 Key: YARN-5297
 URL: https://issues.apache.org/jira/browse/YARN-5297
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siddharth Seth


The exception trace is unnecessary, and can cause confusion.
{code}
2016-06-16 22:02:54,262 INFO  ipc.Server (Server.java:logException(2401)) - IPC 
Server handler 0 on 8030, call 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allocate from 
172.22.79.149:42698 Call#2241 Retry#0
org.apache.hadoop.yarn.exceptions.ApplicationMasterNotRegisteredException: AM 
is not registered for known application attempt: 
appattempt_1466112179488_0001_01 or RM had restarted after AM registered . 
AM should re-register.
  at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:454)
  at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
  at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
  at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
{code}

cc [~djp]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5270) Solve miscellaneous issues caused by YARN-4844

2016-06-20 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340336#comment-15340336
 ] 

Siddharth Seth commented on YARN-5270:
--

Not sure why the NotImplementedYet exceptions are required. Is this to handle 
cases where some projects may have implemented Resource ?
Anyway - if the exception has to stay - the message should be better to avoid 
confusion. Indicate that this is implemented in the actual implementation.

> Solve miscellaneous issues caused by YARN-4844
> --
>
> Key: YARN-5270
> URL: https://issues.apache.org/jira/browse/YARN-5270
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-5270-branch-2.001.patch, 
> YARN-5270-branch-2.8.001.patch
>
>
> Such as javac warnings reported by YARN-5077 and type converting issues in 
> Resources class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5224) Logs for a completed container are not available in the yarn logs output for a live application

2016-06-09 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-5224:


 Summary: Logs for a completed container are not available in the 
yarn logs output for a live application
 Key: YARN-5224
 URL: https://issues.apache.org/jira/browse/YARN-5224
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Siddharth Seth


This affects 'short' jobs like MapReduce and Tez more than long running apps.
Related: YARN-5193 (but that only covers long running apps)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5205) yarn logs for live applications does not provide log files which may have already been aggregated

2016-06-06 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-5205:


 Summary: yarn logs for live applications does not provide log 
files which may have already been aggregated
 Key: YARN-5205
 URL: https://issues.apache.org/jira/browse/YARN-5205
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Siddharth Seth


With periodic aggregation enabled, the logs which have been partially 
aggregated are not always displayed by the yarn logs command.

If the file exists in the log dir for a container - all previously aggregated 
files with the same name, along with the current file will be part of the yarn 
log output.
Files which have been previously aggregated, for which a file with the same 
name does not exists in the container log dir do not show up in the output.

After the app completes, all logs are available.

cc [~xgong]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5194) Avoid adding yarn-site to all Configuration instances created by the JVM

2016-06-06 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316909#comment-15316909
 ] 

Siddharth Seth commented on YARN-5194:
--

This will likely break a bunch of things - hence targeted at 3.0. Could you 
please elaborate on HDFS getConf ?
If there's enough interest to reduce the size of config objects in memory / 
serialized size - this can be taken up for a 3.x release.

> Avoid adding yarn-site to all Configuration instances created by the JVM
> 
>
> Key: YARN-5194
> URL: https://issues.apache.org/jira/browse/YARN-5194
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>
> {code}
> static {
> addDeprecatedKeys();
> Configuration.addDefaultResource(YARN_DEFAULT_CONFIGURATION_FILE);
> Configuration.addDefaultResource(YARN_SITE_CONFIGURATION_FILE);
>   }
> {code}
> This puts the contents of yarn-default and yarn-site into every configuration 
> instance created in the VM after YarnConfiguration has been initialized.
> This should be changed to a local addResource for the specific 
> YarnConfiguration instance, instead of polluting every Configuration instance.
> Incompatible change. Have set the target version to 3.x. 
> The same applies to HdfsConfiguration (hdfs-site.xml), and Configuration 
> (core-site.xml etc).
> core-site may be worth including everywhere, however it would be better to 
> expect users to explicitly add the relevant resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5193) For long running services, aggregate logs when a container completes instead of when the app completes

2016-06-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313121#comment-15313121
 ] 

Siddharth Seth commented on YARN-5193:
--

bq. I don't think long-running necessarily means low container churn, although 
I'm sure it does for the use-case you have in mind. For example, an 
app-as-service that farms out work as containers on YARN and runs forever. High 
load with short work duration for such a service = high container churn but it 
never exits.
Fair point. I'm guessing this would end up getting implemented as a parameter 
in the API, rather than a blanket 'long-running=aggregate after container 
complete'

bq. Periodic aggregation would be more palatable for such a use-case. Also 
log-aggregation duration is not guaranteed. Even if we aggregate as the 
container completes there's no guarantee how long it will take, so any client 
that wants to see the logs in HDFS just as containers complete has to handle 
fetching it from the nodes in the worst-case scenario or retrying until it's 
available.
There would definitely still be the time window where the container has 
completed, and the log hasn't yet been aggregated. It'll likely be a little 
shorter than a specific time window - if that's worth anything.

The main problem seems to be discovering these dead containers, and where they 
ran. ATS/AHS would have been ideal, but can't really be enabled on a reasonably 
sized cluster to log container information.
Maybe log-aggregation can write out indexing information up front - so that the 
CLI can at least find all containers / the node where containers ran.

> For long running services, aggregate logs when a container completes instead 
> of when the app completes
> --
>
> Key: YARN-5193
> URL: https://issues.apache.org/jira/browse/YARN-5193
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>
> For a long running service, containers will typically not complete very 
> often. However, when a container completes - it would be useful to aggregate 
> the logs right then, instead of waiting for the app to complete.
> This will allow the command line log tool to lookup containers for an app 
> from the log file index itself, instead of having to go and talk to YARN. 
> Talking to YARN really only works if ATS is enabled, and YARN is configured 
> to publish container information to ATS (That may not always be the case - 
> since this can overload ATS quite fast).
> There's some added benefits like cleaning out local disk space early, instead 
> of waiting till the app completes. (There's probably a separate jira 
> somewhere about cleanup of container for long running services anyway)
> cc [~vinodkv], [~xgong]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5193) For long running services, aggregate logs when a container completes instead of when the app completes

2016-06-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312822#comment-15312822
 ] 

Siddharth Seth commented on YARN-5193:
--

Log rolling should help. I'm yet to try it out. Do you happen to know how it 
works when a container dies - will the logs be aggregated immediately, or after 
the time window.

bq. Main thing to watch out for here is additional load to the namenode.
Yes. The original change to aggregate at the end was required for shorter 
running jobs with more container churn. For a longer running service - 
containers will likely not go down very often, and it should be oK to upload 
logs occasionally (without keeping connections open).

> For long running services, aggregate logs when a container completes instead 
> of when the app completes
> --
>
> Key: YARN-5193
> URL: https://issues.apache.org/jira/browse/YARN-5193
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>
> For a long running service, containers will typically not complete very 
> often. However, when a container completes - it would be useful to aggregate 
> the logs right then, instead of waiting for the app to complete.
> This will allow the command line log tool to lookup containers for an app 
> from the log file index itself, instead of having to go and talk to YARN. 
> Talking to YARN really only works if ATS is enabled, and YARN is configured 
> to publish container information to ATS (That may not always be the case - 
> since this can overload ATS quite fast).
> There's some added benefits like cleaning out local disk space early, instead 
> of waiting till the app completes. (There's probably a separate jira 
> somewhere about cleanup of container for long running services anyway)
> cc [~vinodkv], [~xgong]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5194) Avoid adding yarn-site to all Configuration instances created by the JVM

2016-06-02 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-5194:


 Summary: Avoid adding yarn-site to all Configuration instances 
created by the JVM
 Key: YARN-5194
 URL: https://issues.apache.org/jira/browse/YARN-5194
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth


{code}
static {
addDeprecatedKeys();
Configuration.addDefaultResource(YARN_DEFAULT_CONFIGURATION_FILE);
Configuration.addDefaultResource(YARN_SITE_CONFIGURATION_FILE);
  }
{code}

This puts the contents of yarn-default and yarn-site into every configuration 
instance created in the VM after YarnConfiguration has been initialized.

This should be changed to a local addResource for the specific 
YarnConfiguration instance, instead of polluting every Configuration instance.

Incompatible change. Have set the target version to 3.x. 

The same applies to HdfsConfiguration (hdfs-site.xml), and Configuration 
(core-site.xml etc).
core-site may be worth including everywhere, however it would be better to 
expect users to explicitly add the relevant resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5193) For long running services, aggregate logs when a container completes instead of when the app completes

2016-06-02 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-5193:


 Summary: For long running services, aggregate logs when a 
container completes instead of when the app completes
 Key: YARN-5193
 URL: https://issues.apache.org/jira/browse/YARN-5193
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth


For a long running service, containers will typically not complete very often. 
However, when a container completes - it would be useful to aggregate the logs 
right then, instead of waiting for the app to complete.
This will allow the command line log tool to lookup containers for an app from 
the log file index itself, instead of having to go and talk to YARN. Talking to 
YARN really only works if ATS is enabled, and YARN is configured to publish 
container information to ATS (That may not always be the case - since this can 
overload ATS quite fast).

There's some added benefits like cleaning out local disk space early, instead 
of waiting till the app completes. (There's probably a separate jira somewhere 
about cleanup of container for long running services anyway)

cc [~vinodkv], [~xgong]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-4816) SystemClock API broken in 2.9.0

2016-03-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned YARN-4816:


Assignee: Siddharth Seth

> SystemClock API broken in 2.9.0
> ---
>
> Key: YARN-4816
> URL: https://issues.apache.org/jira/browse/YARN-4816
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: YARN-4816.1.txt
>
>
> https://issues.apache.org/jira/browse/YARN-4526 removed the public 
> constructor on SystemClock - making it an incompatible change.
> cc [~kasha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4816) SystemClock API broken in 2.9.0

2016-03-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194673#comment-15194673
 ] 

Siddharth Seth commented on YARN-4816:
--

Thanks for the review [~kasha] - committing to master and branch-2.

> SystemClock API broken in 2.9.0
> ---
>
> Key: YARN-4816
> URL: https://issues.apache.org/jira/browse/YARN-4816
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Siddharth Seth
> Attachments: YARN-4816.1.txt
>
>
> https://issues.apache.org/jira/browse/YARN-4526 removed the public 
> constructor on SystemClock - making it an incompatible change.
> cc [~kasha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4816) SystemClock API broken in 2.9.0

2016-03-14 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-4816:
-
Attachment: YARN-4816.1.txt

Trivial patch. Re-introduces the public constructor and marks it as deprecated.

[~kasha] - please review.

> SystemClock API broken in 2.9.0
> ---
>
> Key: YARN-4816
> URL: https://issues.apache.org/jira/browse/YARN-4816
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Siddharth Seth
> Attachments: YARN-4816.1.txt
>
>
> https://issues.apache.org/jira/browse/YARN-4526 removed the public 
> constructor on SystemClock - making it an incompatible change.
> cc [~kasha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4816) SystemClock API broken in 2.9.0

2016-03-14 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-4816:


 Summary: SystemClock API broken in 2.9.0
 Key: YARN-4816
 URL: https://issues.apache.org/jira/browse/YARN-4816
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Siddharth Seth


https://issues.apache.org/jira/browse/YARN-4526 removed the public constructor 
on SystemClock - making it an incompatible change.

cc [~kasha]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4554) ApplicationReport.getDiagnostics does not return diagnostics from individual attempts

2016-01-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089794#comment-15089794
 ] 

Siddharth Seth commented on YARN-4554:
--

Please go ahead. Separating the diagnostics per appAttempt in the main report 
would be useful. Something like "[appAttempt1=...], [appAttempt2=...]"

> ApplicationReport.getDiagnostics does not return diagnostics from individual 
> attempts
> -
>
> Key: YARN-4554
> URL: https://issues.apache.org/jira/browse/YARN-4554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Siddharth Seth
>Assignee: Sunil G
>
> For an Application with ApplicationReport.getFinalApplicationStatus=FAILED 
> and ApplicationReport.getYarnApplicationState=FINISHED - 
> ApplicationReport.getDiagnostics returns an empty string.
> Instead I had to use ApplicationReport.getCurrentApplicationAttemptId, 
> followed by getApplicationAttemptReport to get diagnostics for the attempt - 
> which contained the information I had used to unregister the app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4554) ApplicationReport.getDiagnostics does not return diagnostics from individual attempts

2016-01-06 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-4554:


 Summary: ApplicationReport.getDiagnostics does not return 
diagnostics from individual attempts
 Key: YARN-4554
 URL: https://issues.apache.org/jira/browse/YARN-4554
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Siddharth Seth


For an Application with ApplicationReport.getFinalApplicationStatus=FAILED and 
ApplicationReport.getYarnApplicationState=FINISHED - 
ApplicationReport.getDiagnostics returns an empty string.

Instead I had to use ApplicationReport.getCurrentApplicationAttemptId, followed 
by getApplicationAttemptReport to get diagnostics for the attempt - which 
contained the information I had used to unregister the app.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status

2015-12-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060611#comment-15060611
 ] 

Siddharth Seth commented on YARN-4207:
--

+1. This looks good. Thanks [~rhaase]

> Add a non-judgemental YARN app completion status
> 
>
> Key: YARN-4207
> URL: https://issues.apache.org/jira/browse/YARN-4207
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Rich Haase
>  Labels: trivial
> Attachments: YARN-4207.patch
>
>
> For certain applications, it doesn't make sense to have SUCCEEDED or FAILED 
> end state. For example, Tez sessions may include multiple DAGs, some of which 
> have succeeded and some have failed; there's no clear status for the session 
> both logically and from user perspective (users are confused either way). 
> There needs to be a status not implying success or failure, such as 
> "done"/"ended"/"finished".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status

2015-10-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955334#comment-14955334
 ] 

Siddharth Seth commented on YARN-4207:
--

[~rhaase] - thanks for taking this up.
Along with the change to FinalApplicationStatus, a change is also required to 
the proto definition (yarn_protos.proto). There'll be a set of converter 
methods which translate between the proto and FinalApplicationStatus which will 
also need to be changed.

Other than that, I believe adding this additional value is a safe change.

> Add a non-judgemental YARN app completion status
> 
>
> Key: YARN-4207
> URL: https://issues.apache.org/jira/browse/YARN-4207
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>  Labels: trivial
> Attachments: YARN-4207.patch
>
>
> For certain applications, it doesn't make sense to have SUCCEEDED or FAILED 
> end state. For example, Tez sessions may include multiple DAGs, some of which 
> have succeeded and some have failed; there's no clear status for the session 
> both logically and from user perspective (users are confused either way). 
> There needs to be a status not implying success or failure, such as 
> "done"/"ended"/"finished".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4207) Add a non-judgemental YARN app completion status

2015-10-13 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-4207:
-
Assignee: Rich Haase

> Add a non-judgemental YARN app completion status
> 
>
> Key: YARN-4207
> URL: https://issues.apache.org/jira/browse/YARN-4207
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Rich Haase
>  Labels: trivial
> Attachments: YARN-4207.patch
>
>
> For certain applications, it doesn't make sense to have SUCCEEDED or FAILED 
> end state. For example, Tez sessions may include multiple DAGs, some of which 
> have succeeded and some have failed; there's no clear status for the session 
> both logically and from user perspective (users are confused either way). 
> There needs to be a status not implying success or failure, such as 
> "done"/"ended"/"finished".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-162) nodemanager log aggregation has scaling issues with namenode

2015-09-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933751#comment-14933751
 ] 

Siddharth Seth commented on YARN-162:
-

Go ahead.

> nodemanager log aggregation has scaling issues with namenode
> 
>
> Key: YARN-162
> URL: https://issues.apache.org/jira/browse/YARN-162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 0.23.3
>Reporter: Nathan Roberts
>Assignee: Siddharth Seth
>Priority: Critical
> Attachments: YARN-162.txt, YARN-162_WIP.txt, YARN-162_v2.txt, 
> YARN-162_v2.txt
>
>
> Log aggregation causes fd explosion on the namenode. On large clusters this 
> can exhaust FDs to the point where datanodes can't check-in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4208) Support additional values for FinalApplicationStatus

2015-09-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved YARN-4208.
--
Resolution: Duplicate

Looks like [~sershe] already filed YARN-4207. Closing this as a dupe.

> Support additional values for FinalApplicationStatus
> 
>
> Key: YARN-4208
> URL: https://issues.apache.org/jira/browse/YARN-4208
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Siddharth Seth
>
> FinalApplicationStatus currently supports SUCCEEDED, FAILED and KILLED as 
> values after an application completes.
> While these are sufficient for jobs like MR where a single job maps to a 
> single job, these values are not very useful for longer running applications. 
> It does actually lead to confusion when users end up interpreting this value 
> as the exit status of a job which may be one of many running as part of a 
> single application.
> A more generic FinalAppStatus status such as 'COMPLETED' would be useful to 
> have.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4207) Add a non-judgemental YARN app completion status

2015-09-24 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-4207:
-
Issue Type: Improvement  (was: Bug)

> Add a non-judgemental YARN app completion status
> 
>
> Key: YARN-4207
> URL: https://issues.apache.org/jira/browse/YARN-4207
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>
> For certain applications, it doesn't make sense to have SUCCEEDED or FAILED 
> end state. For example, Tez sessions may include multiple DAGs, some of which 
> have succeeded and some have failed; there's no clear status for the session 
> both logically and from user perspective (users are confused either way). 
> There needs to be a status not implying success or failure, such as 
> "done"/"ended"/"finished".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4208) Support additional values for FinalApplicationStatus

2015-09-24 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-4208:


 Summary: Support additional values for FinalApplicationStatus
 Key: YARN-4208
 URL: https://issues.apache.org/jira/browse/YARN-4208
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Siddharth Seth


FinalApplicationStatus currently supports SUCCEEDED, FAILED and KILLED as 
values after an application completes.

While these are sufficient for jobs like MR where a single job maps to a single 
job, these values are not very useful for longer running applications. It does 
actually lead to confusion when users end up interpreting this value as the 
exit status of a job which may be one of many running as part of a single 
application.

A more generic FinalAppStatus status such as 'COMPLETED' would be useful to 
have.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4207) Add a non-judgemental YARN app completion status

2015-09-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907140#comment-14907140
 ] 

Siddharth Seth commented on YARN-4207:
--

cc [~vinodkv]

> Add a non-judgemental YARN app completion status
> 
>
> Key: YARN-4207
> URL: https://issues.apache.org/jira/browse/YARN-4207
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>
> For certain applications, it doesn't make sense to have SUCCEEDED or FAILED 
> end state. For example, Tez sessions may include multiple DAGs, some of which 
> have succeeded and some have failed; there's no clear status for the session 
> both logically and from user perspective (users are confused either way). 
> There needs to be a status not implying success or failure, such as 
> "done"/"ended"/"finished".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2015-06-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588847#comment-14588847
 ] 

Siddharth Seth commented on YARN-1197:
--

bq. I would argue that waiting for an NM-RM heartbeat is much worse than 
waiting for an AM-RM heartbeat. With continuous scheduling, the RM can make 
decisions in millisecond time, and the AM can regulate its heartbeats according 
to the application's needs to get fast responses. If an NM-RM heartbeat is 
involved, the application is at the mercy of the cluster settings, which should 
be in the multi-second range for large clusters.
I tend to agree with Sandy's arguments about option a being better in terms of 
latency - and that we shouldn't be architecting this in a manner which would 
limit it to the seconds range rather than milliseconds / hundreds of 
milliseconds when possible.

It's already possible to get fast allocations - low 100s of milliseconds via a 
scheduler loop which is delinked from NM heartbeats and a variable AM-RM 
heartbeat interval, which is under user control rather than being a cluster 
property.

There are going to be improvements to the performance of various protocols in 
YARN. HADOOP-11552 opens up one such option which allows AMs to know about 
allocations as soon as the scheduler has the made the decision, without a 
requirement to poll. Of-course - there's plenty of work to be done before that 
can actually be used :)

That said, callbacks on the RPC can be applied at various levels - including 
NM-RM communication, which can make option b work fast as well. However, it 
will incur the cost of additional RPC roundtrips. Option a, however, can be 
fast from the get go with tuning, and also gets better with future enhancements.

I don't think it's possible for the AM to start using the additional allocation 
till the NM has updated all it's state - including writing out recovery 
information for work preserving restart (Thanks Vinod for pointing this out). 
Seems like that poll/callback will be required - unless the plan is to route 
this information via the RM.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, 
 YARN-1197_Design.pdf


 The current YARN resource management logic assumes resource allocated to a 
 container is fixed during the lifetime of it. When users want to change a 
 resource 
 of an allocated container the only way is releasing it and allocating a new 
 container with expected size.
 Allowing run-time changing resources of an allocated container will give us 
 better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569648#comment-14569648
 ] 

Siddharth Seth commented on YARN-1462:
--

ApplicationReport.newInstance is used by mapreduce and Tez, and potentially 
other applications which may be modeled along the same AMs. It'll be useful to 
make the API change here compatible. This is along the lines of newInstances 
being used for various constructs like ContainerId, AppId, etc.
With the change, I don't believe MR2.6 will work with a 2.8 cluster - depending 
on how the classpath is setup.

 AHS API and other AHS changes to handle tags for completed MR jobs
 --

 Key: YARN-1462
 URL: https://issues.apache.org/jira/browse/YARN-1462
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Karthik Kambatla
Assignee: Xuan Gong
 Fix For: 2.8.0

 Attachments: YARN-1462-branch-2.7-1.2.patch, 
 YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
 YARN-1462.3.patch


 AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3674) YARN application disappears from view

2015-05-19 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549858#comment-14549858
 ] 

Siddharth Seth commented on YARN-3674:
--

Clicking on a specific queue on the scheduler page, followed by a click on the 
'Applications' / 'RUNNING' / etc links - ends up on a page which show no 
information that a queue has been selected. Ends up looking like the cluster 
isn't RUNNING anything or hasn't run anything if the queue isn't used.

For [~sershe] - this was worse. Going back and selecting the default queue made 
no difference to the apps listing.

 YARN application disappears from view
 -

 Key: YARN-3674
 URL: https://issues.apache.org/jira/browse/YARN-3674
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Sergey Shelukhin

 I have 2 tabs open at exact same URL with RUNNING applications view. There is 
 an application that is, in fact, running, that is visible in one tab but not 
 the other. This persists across refreshes. If I open new tab from the tab 
 where the application is not visible, in that tab it shows up ok.
 I didn't change scheduler/queue settings before this behavior happened; on 
 [~sseth]'s advice I went and tried to click the root node of the scheduler on 
 scheduler page; the app still does not become visible.
 Something got stuck somewhere...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-886) make APPLICATION_STOP consistent with APPLICATION_INIT

2015-05-01 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524210#comment-14524210
 ] 

Siddharth Seth commented on YARN-886:
-

[~djp] - this looks like it's still valid. START is sent to the service that 
the app specified. STOP is sent to all AuxServices.

 make APPLICATION_STOP consistent with APPLICATION_INIT
 --

 Key: YARN-886
 URL: https://issues.apache.org/jira/browse/YARN-886
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch

 Currently, there is inconsistency between the start/stop behaviour.
 See Siddharth's comment in MAPREDUCE-5329: The start/stop behaviour should 
 be consistent. We shouldn't send the stop to all service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1503) Support making additional 'LocalResources' available to running containers

2015-05-01 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1503:
-
Assignee: (was: Siddharth Seth)

 Support making additional 'LocalResources' available to running containers
 --

 Key: YARN-1503
 URL: https://issues.apache.org/jira/browse/YARN-1503
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth

 We have a use case, where additional resources (jars, libraries etc) need to 
 be made available to an already running container. Ideally, we'd like this to 
 be done via YARN (instead of having potentially multiple containers per node 
 download resources on their own).
 Proposal:
   NM to support an additional API where a list of resources can be specified. 
 Something like localiceResource(ContainerId, MapString, LocalResource)
   NM would also require an additional API to get state for these resources - 
 getLocalizationState(ContainerId) - which returns the current state of all 
 local resources for the specified container(s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-575) ContainerManager APIs should be user accessible

2015-05-01 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved YARN-575.
-
Resolution: Won't Fix
  Assignee: (was: Vinod Kumar Vavilapalli)

Closing as Won't Fix based on the comments.

 ContainerManager APIs should be user accessible
 ---

 Key: YARN-575
 URL: https://issues.apache.org/jira/browse/YARN-575
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth
Priority: Critical

 Auth for ContainerManager is based on the containerId being accessed - since 
 this is what is used to launch containers (There's likely another jira 
 somewhere to change this to not be containerId based).
 What this also means is the API is effectively not usable with kerberos 
 credentials.
 Also, it should be possible to use this API with some generic tokens 
 (RMDelegation?), instead of with Container specific tokens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-670) Add an Exception to indicate 'Maintenance' for NMs and add this to the JavaDoc for appropriate protocols

2015-05-01 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-670:

Assignee: (was: Siddharth Seth)

 Add an Exception to indicate 'Maintenance' for NMs and add this to the 
 JavaDoc for appropriate protocols
 

 Key: YARN-670
 URL: https://issues.apache.org/jira/browse/YARN-670
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3310) MR-279: Log info about the location of dist cache

2015-05-01 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-3310:
-
Assignee: (was: Siddharth Seth)

 MR-279: Log info about the location of dist cache
 -

 Key: YARN-3310
 URL: https://issues.apache.org/jira/browse/YARN-3310
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ramya Sunil
Priority: Minor

 Currently, there is no log info available about the actual location of the 
 file/archive in dist cache being used by the task except for the ln command 
 in task.sh. We need to log this information to help in debugging esp in those 
 cases where there are more than one archive with the same name. 
 In 0.20.x, in task logs, one could find log info such as the following:
 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: distcache 
 location/archive - mapred.local.dir/archive 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-197) Add a separate log server

2015-05-01 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved YARN-197.
-
Resolution: Won't Fix

Resolving due to the presence of additional services which can be used for 
serving logs.

 Add a separate log server
 -

 Key: YARN-197
 URL: https://issues.apache.org/jira/browse/YARN-197
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Siddharth Seth

 Currently, the job history server is being used for log serving. A separate 
 log server can be added which can deal with serving logs, along with other 
 functionality like log retention, merging, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-197) Add a separate log server

2015-04-06 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481525#comment-14481525
 ] 

Siddharth Seth commented on YARN-197:
-

Yes, as long as the logs are being served out by a sub-system other than the 
MapReduce history server.

 Add a separate log server
 -

 Key: YARN-197
 URL: https://issues.apache.org/jira/browse/YARN-197
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Siddharth Seth

 Currently, the job history server is being used for log serving. A separate 
 log server can be added which can deal with serving logs, along with other 
 functionality like log retention, merging, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-671) Add an interface on the RM to move NMs into a maintenance state

2015-02-09 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-671:

Assignee: (was: Siddharth Seth)

 Add an interface on the RM to move NMs into a maintenance state
 ---

 Key: YARN-671
 URL: https://issues.apache.org/jira/browse/YARN-671
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-671) Add an interface on the RM to move NMs into a maintenance state

2015-02-09 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312917#comment-14312917
 ] 

Siddharth Seth commented on YARN-671:
-

The intent was to have an interface to decommission a cluster via the RM, 
instead of talking to NMs. I think that's going to be the case in YARN-914 - so 
yep, this can be closed.

 Add an interface on the RM to move NMs into a maintenance state
 ---

 Key: YARN-671
 URL: https://issues.apache.org/jira/browse/YARN-671
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality

2015-02-03 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304189#comment-14304189
 ] 

Siddharth Seth commented on YARN-1723:
--

+1. The patch looks good. Will commit after jenkins comes back.

 AMRMClientAsync missing blacklist addition and removal functionality
 

 Key: YARN-1723
 URL: https://issues.apache.org/jira/browse/YARN-1723
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Bikas Saha
Assignee: Bartosz Ługowski
 Fix For: 2.7.0

 Attachments: YARN-1723.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality

2015-02-03 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1723:
-
Assignee: Bartosz Ługowski

 AMRMClientAsync missing blacklist addition and removal functionality
 

 Key: YARN-1723
 URL: https://issues.apache.org/jira/browse/YARN-1723
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Bikas Saha
Assignee: Bartosz Ługowski
 Fix For: 2.7.0

 Attachments: YARN-1723.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2830) Add backwords compatible ContainerId.newInstance constructor for use within Tez Local Mode

2014-11-07 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202758#comment-14202758
 ] 

Siddharth Seth commented on YARN-2830:
--

+1 for retaining the old newInstance method.
One concern about the patch though - ContainerId will end up with two very 
similar methods.
newContainerId(AppAttemptId, int) | Deprecated
newContainerId(AppAttemptId, long)
It's very easy to get these incorrect within YARN itself - which can introduce 
some tough to debug issues.

Instead, I think it'll be a lot safer to rename the new method - and retaining 
the old one with the old one for compatibility.

 Add backwords compatible ContainerId.newInstance constructor for use within 
 Tez Local Mode
 --

 Key: YARN-2830
 URL: https://issues.apache.org/jira/browse/YARN-2830
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Blocker
 Attachments: YARN-2830-v1.patch, YARN-2830-v2.patch


 YARN-2229 modified the private unstable api for constructing. Tez uses this 
 api (shouldn't, but does) for use with Tez Local Mode. This causes a 
 NoSuchMethod error when using Tez compiled against pre-2.6. Instead I propose 
 we add the backwards compatible api since overflow is not a problem in tez 
 local mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of AdminService

2014-10-31 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191562#comment-14191562
 ] 

Siddharth Seth commented on YARN-2698:
--

FWIW, this can break downstream components which may have unit tests making use 
of NodeReport. The API is annotated private, however it would be useful to have 
some kind of stable mocks for entities which are likely to be used for testing 
downstream projects.

 Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of 
 AdminService
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
 YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, 
 YARN-2698-20141029-2.patch, YARN-2698-20141030-1.patch


 YARN AdminService should have write API only, for other read APIs, they 
 should be located at RM ClientService. Include,
 1) getClusterNodeLabels
 2) getNodeToLabels
 3) getNodeReport should contains labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of AdminService

2014-10-31 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191569#comment-14191569
 ] 

Siddharth Seth commented on YARN-2698:
--

Actually, it'll help downstream projects if the old method is left in place, 
and deprecated - instead of removing it altogether. 

 Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of 
 AdminService
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
 YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, 
 YARN-2698-20141029-2.patch, YARN-2698-20141030-1.patch


 YARN AdminService should have write API only, for other read APIs, they 
 should be located at RM ClientService. Include,
 1) getClusterNodeLabels
 2) getNodeToLabels
 3) getNodeReport should contains labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2789) Re-instate the NodeReport.newInstance API modified in YARN-2968

2014-10-31 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-2789:


 Summary: Re-instate the NodeReport.newInstance API modified in 
YARN-2968
 Key: YARN-2789
 URL: https://issues.apache.org/jira/browse/YARN-2789
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Siddharth Seth
Priority: Critical


Even though this is a private API, it will be used by downstream projects for 
testing. It'll be useful for this to be re-instated, maybe with a deprecated 
annotation, so that older versions of downstream projects can build against 
Hadoop 2.6.

create() being private is a problem for multiple other classes - ContainerId, 
AppId etc, Container, NodeId ... Most classes on the client facing YARN APIs 
are likely to be required for testing in downstream projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of AdminService

2014-10-31 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192203#comment-14192203
 ] 

Siddharth Seth commented on YARN-2698:
--

Created YARN-2789

 Move getClusterNodeLabels and getNodeToLabels to YarnClient instead of 
 AdminService
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
 YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, 
 YARN-2698-20141029-2.patch, YARN-2698-20141030-1.patch


 YARN AdminService should have write API only, for other read APIs, they 
 should be located at RM ClientService. Include,
 1) getClusterNodeLabels
 2) getNodeToLabels
 3) getNodeReport should contains labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2464) Provide Hadoop as a local resource (on HDFS) which can be used by other projcets

2014-08-28 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-2464:


 Summary: Provide Hadoop as a local resource (on HDFS) which can be 
used by other projcets
 Key: YARN-2464
 URL: https://issues.apache.org/jira/browse/YARN-2464
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth


DEFAULT_YARN_APPLICATION_CLASSPATH are used by YARN projects to setup their AM 
/ task classpaths if they have a dependency on Hadoop libraries.

It'll be useful to provide similar access to a Hadoop tarball (Hadoop libs, 
native libraries) etc, which could be used instead - for applications which do 
not want to rely upon Hadoop versions from a cluster node. This would also 
require functionality to update the classpath/env for the apps based on the 
structure of the tar.

As an example, MR has support for a full tar (for rolling upgrades). Similarly, 
Tez ships hadoop libraries along with it's build. I'm not sure about the Spark 
/ Storm / HBase model for this - but using a common copy instead of everyone 
localizing Hadoop libraries would be useful.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-07-23 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072356#comment-14072356
 ] 

Siddharth Seth commented on YARN-2229:
--

[~ozawa] - I was primarily looking at this from a  backward compatibility 
perspective. Will leave the decision to go with the current approach or adding 
a hidden field to you, Jian and Zhijie.

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2264) Race in DrainDispatcher can cause random test failures

2014-07-08 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-2264:


 Summary: Race in DrainDispatcher can cause random test failures
 Key: YARN-2264
 URL: https://issues.apache.org/jira/browse/YARN-2264
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth


This is what can happen.

This is the potential race.
DrainDispatcher is started via serviceStart() . As a last step, this starts the 
actual dispatcher thread (eventHandlingThread.start() - and returns immediately 
- which means the thread may or may not have started up by the time start 
returns.
Event sequence: 
UserThread: calls dispatcher.getEventHandler().handle()
This sets drained = false, and a context switch happens.
DispatcherThread: starts running
DispatcherThread drained = queue.isEmpty(); - This sets drained to true, since 
Thread1 yielded before putting anything into the queue.
UserThread: actual.handle(event) - which puts the event in the queue for the 
dispatcher thread to process, and returns control.
UserThread: dispatcher.await() - Since drained is true, this returns 
immediately - even though there is a pending event to process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-766) TestNodeManagerShutdown should use Shell to form the output path

2014-04-18 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974450#comment-13974450
 ] 

Siddharth Seth commented on YARN-766:
-

[~djp], The 2.x patch is only required to fix a difference in formatting 
between trunk and branch-2. Up to you on whether to fix the trunk formatting in 
this jira or whenever the code is touched next.

 TestNodeManagerShutdown should use Shell to form the output path
 

 Key: YARN-766
 URL: https://issues.apache.org/jira/browse/YARN-766
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-766.branch-2.txt, YARN-766.trunk.txt, YARN-766.txt


 File scriptFile = new File(tmpDir, scriptFile.sh);
 should be replaced with
 File scriptFile = Shell.appendScriptExtension(tmpDir, scriptFile);
 to match trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1892) Excessive logging in RM

2014-03-28 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-1892:


 Summary: Excessive logging in RM
 Key: YARN-1892
 URL: https://issues.apache.org/jira/browse/YARN-1892
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth
Priority: Minor


Mostly in the CS I believe

{code}
 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
 Application application_1395435468498_0011 reserved container 
container_1395435468498_0011_01_000213 on node host:  #containers=5 
available=4096 used=20960, currently has 1 at priority 4; currentReservation 
4096
{code}

{code}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
hive2 usedResources: memory:20480, vCores:5 clusterResources: memory:81920, 
vCores:16 currentCapacity 0.25 required memory:4096, vCores:1 
potentialNewCapacity: 0.255 (  max-capacity: 0.25)
{code}





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1664) Add a utility to retrieve the RM Principal (renewer for tokens)

2014-01-28 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-1664:


 Summary: Add a utility to retrieve the RM Principal (renewer for 
tokens)
 Key: YARN-1664
 URL: https://issues.apache.org/jira/browse/YARN-1664
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth


Currently the logic to retrieve the renewer to be used while retrieving HDFS 
tokens resides in MapReduce. This should ideally be a utility in YARN since 
it's likely to be required by other applications as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (YARN-1517) AMFilterInitializer with configurable AMIpFilter

2013-12-18 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1517:
-

Assignee: Pramod Immaneni

 AMFilterInitializer with configurable AMIpFilter
 

 Key: YARN-1517
 URL: https://issues.apache.org/jira/browse/YARN-1517
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Pramod Immaneni
Assignee: Pramod Immaneni

 We need to implement custom logic in a filter for our webservice similar to 
 org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter and it would be 
 convenient if we extended AmIpFilter as the proxy locations already 
 available. 
 We would need to specify a filter initializer for this filter. The 
 initializer would be same as AmFilterInitializer except that it would add our 
 filter instead of AmIpFilter and it would be better if we could reuse 
 AmlFilterInitializer. Can AmFilterInitializer be updated to specify a filter 
 name and filter class.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1503) Support making additional 'LocalResources' available to running containers

2013-12-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848111#comment-13848111
 ] 

Siddharth Seth commented on YARN-1503:
--

bq. A slightly more detailed explanation of the use-case so everyone can 
understand? And why something like YARN-1040 is not enough.
YARN-1040 talks about launching multiple processes within the same container. 
This requirement is for a single running process - we want to avoid 
re-launching the process due to the cost involved with starting a new Java 
process. The specific use case is running different tasks within the same JVM - 
where one task may need some additional jars (Hive UDFs for example).

 Support making additional 'LocalResources' available to running containers
 --

 Key: YARN-1503
 URL: https://issues.apache.org/jira/browse/YARN-1503
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth

 We have a use case, where additional resources (jars, libraries etc) need to 
 be made available to an already running container. Ideally, we'd like this to 
 be done via YARN (instead of having potentially multiple containers per node 
 download resources on their own).
 Proposal:
   NM to support an additional API where a list of resources can be specified. 
 Something like localiceResource(ContainerId, MapString, LocalResource)
   NM would also require an additional API to get state for these resources - 
 getLocalizationState(ContainerId) - which returns the current state of all 
 local resources for the specified container(s).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (YARN-1503) Support making additional 'LocalResources' available to running containers

2013-12-12 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-1503:


 Summary: Support making additional 'LocalResources' available to 
running containers
 Key: YARN-1503
 URL: https://issues.apache.org/jira/browse/YARN-1503
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Siddharth Seth
Assignee: Siddharth Seth


We have a use case, where additional resources (jars, libraries etc) need to be 
made available to an already running container. Ideally, we'd like this to be 
done via YARN (instead of having potentially multiple containers per node 
download resources on their own).

Proposal:
  NM to support an additional API where a list of resources can be specified. 
Something like localiceResource(ContainerId, MapString, LocalResource)
  NM would also require an additional API to get state for these resources - 
getLocalizationState(ContainerId) - which returns the current state of all 
local resources for the specified container(s).




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1274:
-

Attachment: YARN-1274.1.txt

Updated launch_container to create the app level local and log directories. 
Verified dir permissions on a secure cluster.

 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Siddharth Seth
Priority: Blocker
 Attachments: YARN-1274.1.txt


 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-05 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1274:
-

Attachment: YARN-1274.trunk.1.txt

Patch for trunk and branch-2. The previous patch applies to branch-2.1.

 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Siddharth Seth
Priority: Blocker
 Attachments: YARN-1274.1.txt, YARN-1274.trunk.1.txt


 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-04 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786910#comment-13786910
 ] 

Siddharth Seth commented on YARN-1274:
--

I'm in favour of chanigng the LCE as well. it looks like the log dirs may need 
to be created with the correct permissions as well.

 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Blocker

 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-04 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth reassigned YARN-1274:


Assignee: Siddharth Seth  (was: Alejandro Abdelnur)

 LCE fails to run containers that don't have resources to localize
 -

 Key: YARN-1274
 URL: https://issues.apache.org/jira/browse/YARN-1274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Alejandro Abdelnur
Assignee: Siddharth Seth
Priority: Blocker

 LCE container launch assumes the usercache/USER directory exists and it is 
 owned by the user running the container process.
 But the directory is created only if there are resources to localize by the 
 LCE localization command, if there are not resourcdes to localize, LCE 
 localization never executes and launching fails reporting 255 exit code and 
 the NM logs have something like:
 {code}
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
 provided 1
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
 llama
 2013-10-04 14:07:56,425 INFO 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
 directory llama in 
 /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
  - Permission denied
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running

2013-10-03 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784850#comment-13784850
 ] 

Siddharth Seth commented on YARN-1131:
--

Will open the followup jiras. Running this through jenkins again. Haven't seen 
the specific test fail or timeout on my local runs.

 $yarn logs command should return an appropriate error message if YARN 
 application is still running
 --

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt, YARN-1131.2.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading

2013-10-03 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784864#comment-13784864
 ] 

Siddharth Seth commented on YARN-890:
-

+1. Resources should not be rounded up.
Is there a similar round up in the actual allocation code, which may cause 
additional containers to be allocated to a queue ?.
Should the CS be allowing nodes to register if the nm-memory.mb is not a 
multiple of minimum-allocation-mb, or should it just be rounding down at 
registration ?

 The roundup for memory values on resource manager UI is misleading
 --

 Key: YARN-890
 URL: https://issues.apache.org/jira/browse/YARN-890
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Trupti Dhavle
Assignee: Xuan Gong
 Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, 
 YARN-890.1.patch, YARN-890.2.patch


 From the yarn-site.xml, I see following values-
 property
 nameyarn.nodemanager.resource.memory-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.maximum-allocation-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.minimum-allocation-mb/name
 value1024/value
 /property
 However the resourcemanager UI shows total memory as 5MB 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running

2013-10-03 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1131:
-

Attachment: YARN-1131.3.txt

Updated the patch to get the tests working, also added one more test for when 
an app is not known by the RM.

 $yarn logs command should return an appropriate error message if YARN 
 application is still running
 --

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1131) $yarn logs command should return an appropriate error message if YARN application is still running

2013-10-03 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785718#comment-13785718
 ] 

Siddharth Seth commented on YARN-1131:
--

If another state does get added to the YarnApplicationState - we don't know if 
this is a final state or not. I'd prefer falling back to trying to find the 
logs on disk, which is what happens rightnow.

 $yarn logs command should return an appropriate error message if YARN 
 application is still running
 --

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt, YARN-1131.2.txt, YARN-1131.3.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running

2013-10-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784613#comment-13784613
 ] 

Siddharth Seth commented on YARN-1131:
--

[~djp], if you don't mind, I'd like to take this over - would be good to get it 
into the next release.

 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
Assignee: Junping Du
Priority: Minor
 Fix For: 2.1.2-beta


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running

2013-10-02 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1131:
-

Attachment: YARN-1131.1.txt

Changes in the patch
Adds a YARN application status check based on the ApplicationId, to log a 
correct message if the application is running. If an application is not found 
in the RM - the CLI tool will continue to search for the files on hdfs (RM not 
running, or RM restarted).
Fixes the exception in case of an invalid applicationId.

There's still a case, right after an app completes, but before aggregation is 
complete where an empty output is returned. That should be a separate jira 
though.


 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Fix For: 2.1.2-beta

 Attachments: YARN-1131.1.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running

2013-10-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784841#comment-13784841
 ] 

Siddharth Seth commented on YARN-1131:
--

Thanks for the review.

bq. why not use Option.setRequired for the applicationId param - this will 
allow removal of the appIdStr == null check.
Will look into using this.

bq. is a YarnApplicationState check enough to guarantee that the user receives 
the correct error message in case logs are tried to be retrieved when log 
aggregration is still in process just after the app completes?
Had mentioned this in my last comment. Not targeting for this jira.
bq. There's still a case, right after an app completes, but before aggregation 
is complete where an empty output is returned. That should be a separate jira 
though.

bq. typo in function name dumpAContainersLogs or is it meant to read dump a 
container's logs? Maybe just dumpContainerLogs?
I believe it was meant to be this. The diff, unfortunately, is a lot bigger 
than it should be, since the files had to be moved between packages.
bq. containerIdStr and nodeAddressStr could be parsed for correct format to 
error out earlier before invoking the actual log reader functionality.
bq. missing test for when container id specified but node address is not ( and 
vice versa ) ?
Only targeting the specific issue mentioned in the jira. I'm sure there's more 
- but applicationId is likely to be the most common case. The rest can be a 
single or multiple separate jiras.

 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776072#comment-13776072
 ] 

Siddharth Seth commented on YARN-1229:
--

I'm in favour of renaming the shuffle service id as well, and enforcing 
constraints on the names. Shell parameters apparently have name restrictions - 
http://stackoverflow.com/questions/2821043/allowed-characters-in-linux-environment-variable-names
 has some links to standards. Setting aux-service name restrictions based on 
shell name restrictions seems ok to me.

This is an incompatible change though. Sites which have Hadoop 2 (or 0.23) 
deployed would need to change their configs to reflect the shuffle service name 
update. (The shuffleService isn't started when using the default hadoop 
configuration files).

An alternate could be to use base32 encoding for the service name - but would 
prefer not going there.

 Shell$ExitCodeException could happen if AM fails to start
 -

 Key: YARN-1229
 URL: https://issues.apache.org/jira/browse/YARN-1229
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.1-beta


 I run sleep job. If AM fails to start, this exception could occur:
 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
 state FAILED due to: Application application_1379673267098_0020 failed 1 
 times due to AM Container for appattempt_1379673267098_0020_01 exited 
 with  exitCode: 1 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
  line 12: export: 
 `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
 ': not a valid identifier
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776865#comment-13776865
 ] 

Siddharth Seth commented on YARN-1229:
--

Just looked at the patch, it'd be nice to include underscores as well - 
provides for a separator in the allowed character set.

 Shell$ExitCodeException could happen if AM fails to start
 -

 Key: YARN-1229
 URL: https://issues.apache.org/jira/browse/YARN-1229
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.2-beta

 Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch


 I run sleep job. If AM fails to start, this exception could occur:
 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
 state FAILED due to: Application application_1379673267098_0020 failed 1 
 times due to AM Container for appattempt_1379673267098_0020_01 exited 
 with  exitCode: 1 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
  line 12: export: 
 `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
 ': not a valid identifier
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776915#comment-13776915
 ] 

Siddharth Seth commented on YARN-1229:
--

Took a quick look.
- Can you please rename MapreduceShuffle to mapreduce_shuffle (closer to the 
old name)
- The check can be regex based, rather than walking through all the characters.
- Include an empty check along with the null check

 Shell$ExitCodeException could happen if AM fails to start
 -

 Key: YARN-1229
 URL: https://issues.apache.org/jira/browse/YARN-1229
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.2-beta

 Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
 YARN-1229.4.patch


 I run sleep job. If AM fails to start, this exception could occur:
 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
 state FAILED due to: Application application_1379673267098_0020 failed 1 
 times due to AM Container for appattempt_1379673267098_0020_01 exited 
 with  exitCode: 1 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
  line 12: export: 
 `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
 ': not a valid identifier
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776945#comment-13776945
 ] 

Siddharth Seth commented on YARN-1229:
--

Patch looks good. Missed this earlier, but there's several references to 
mapreduce.shuffle in documentation which need to be updated.
Also, since it's being updated - can you make the Pattern final. Thanks

 Shell$ExitCodeException could happen if AM fails to start
 -

 Key: YARN-1229
 URL: https://issues.apache.org/jira/browse/YARN-1229
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.2-beta

 Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
 YARN-1229.4.patch, YARN-1229.5.patch


 I run sleep job. If AM fails to start, this exception could occur:
 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
 state FAILED due to: Application application_1379673267098_0020 failed 1 
 times due to AM Container for appattempt_1379673267098_0020_01 exited 
 with  exitCode: 1 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
  line 12: export: 
 `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
 ': not a valid identifier
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1229) Shell$ExitCodeException could happen if AM fails to start

2013-09-24 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776979#comment-13776979
 ] 

Siddharth Seth commented on YARN-1229:
--

+1. Committing.

 Shell$ExitCodeException could happen if AM fails to start
 -

 Key: YARN-1229
 URL: https://issues.apache.org/jira/browse/YARN-1229
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.1.2-beta

 Attachments: YARN-1229.1.patch, YARN-1229.2.patch, YARN-1229.3.patch, 
 YARN-1229.4.patch, YARN-1229.5.patch, YARN-1229.6.patch


 I run sleep job. If AM fails to start, this exception could occur:
 13/09/20 11:00:23 INFO mapreduce.Job: Job job_1379673267098_0020 failed with 
 state FAILED due to: Application application_1379673267098_0020 failed 1 
 times due to AM Container for appattempt_1379673267098_0020_01 exited 
 with  exitCode: 1 due to: Exception from container-launch:
 org.apache.hadoop.util.Shell$ExitCodeException: 
 /myappcache/application_1379673267098_0020/container_1379673267098_0020_01_01/launch_container.sh:
  line 12: export: 
 `NM_AUX_SERVICE_mapreduce.shuffle=AAA0+gA=
 ': not a valid identifier
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
 at org.apache.hadoop.util.Shell.run(Shell.java:379)
 at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
 at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:270)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:78)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 .Failing this attempt.. Failing the application.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-886) make APPLICATION_STOP consistent with APPLICATION_INIT

2013-08-28 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753074#comment-13753074
 ] 

Siddharth Seth commented on YARN-886:
-

Essentially, APPLICATION_INIT should only be sent to Auxiliary services 
specified by the user in the startContainer requests. Similarly 
APPLICATION_STOP should only be sent to Auxiliary services specified by the 
user during the startContainer call.

 make APPLICATION_STOP consistent with APPLICATION_INIT
 --

 Key: YARN-886
 URL: https://issues.apache.org/jira/browse/YARN-886
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch

 Currently, there is inconsistency between the start/stop behaviour.
 See Siddharth's comment in MAPREDUCE-5329: The start/stop behaviour should 
 be consistent. We shouldn't send the stop to all service.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1045) Improve toString implementation for PBImpls

2013-08-15 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1045:
-

Hadoop Flags: Reviewed

 Improve toString implementation for PBImpls
 ---

 Key: YARN-1045
 URL: https://issues.apache.org/jira/browse/YARN-1045
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth
Assignee: Jian He
 Fix For: 2.1.1-beta

 Attachments: YARN-1045.1.patch, YARN-1045.patch


 The generic toString implementation that is used in most of the PBImpls 
 {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+,  
 );{code} is rather inefficient - replacing \n and \s to generate a one 
 line string. Instead, we can use 
 {code}TextFormat.shortDebugString(getProto());{code}.
 If we can get this into 2.1.0 - great, otherwise the next release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1045) Improve toString implementation for PBImpls

2013-08-15 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1045:
-

Fix Version/s: (was: 2.1.1-beta)
   2.1.0-beta

Committed to branch-2.1.0-beta.

 Improve toString implementation for PBImpls
 ---

 Key: YARN-1045
 URL: https://issues.apache.org/jira/browse/YARN-1045
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth
Assignee: Jian He
 Fix For: 2.1.0-beta

 Attachments: YARN-1045.1.patch, YARN-1045.patch


 The generic toString implementation that is used in most of the PBImpls 
 {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+,  
 );{code} is rather inefficient - replacing \n and \s to generate a one 
 line string. Instead, we can use 
 {code}TextFormat.shortDebugString(getProto());{code}.
 If we can get this into 2.1.0 - great, otherwise the next release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1067) AMRMClient heartbeat interval should not be static

2013-08-15 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-1067:


 Summary: AMRMClient heartbeat interval should not be static
 Key: YARN-1067
 URL: https://issues.apache.org/jira/browse/YARN-1067
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.5-alpha
Reporter: Siddharth Seth


The heartbeat interval can be modified dynamically - more often when there are 
pending requests, and toned down when the heartbeat is solving no purpose other 
than a ping.
There's a couple of jiras which are trying to change the scheduling loop - at 
which point this becomes useful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1045) Improve toString implementation for PBImpls

2013-08-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733934#comment-13733934
 ] 

Siddharth Seth commented on YARN-1045:
--

Thanks for taking this up Jian. Did you get a chance to run all MR and YARN 
unit tests locally  - in case we're relying on the toString format anywhere.

 Improve toString implementation for PBImpls
 ---

 Key: YARN-1045
 URL: https://issues.apache.org/jira/browse/YARN-1045
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth
Assignee: Jian He
 Attachments: YARN-1045.patch


 The generic toString implementation that is used in most of the PBImpls 
 {code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+,  
 );{code} is rather inefficient - replacing \n and \s to generate a one 
 line string. Instead, we can use 
 {code}TextFormat.shortDebugString(getProto());{code}.
 If we can get this into 2.1.0 - great, otherwise the next release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-899) Get queue administration ACLs working

2013-08-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13734040#comment-13734040
 ] 

Siddharth Seth commented on YARN-899:
-

bq.  With this in mind, I think who has access should be based on a union of 
ACLs 
Agree. AMs get ACLs from the RM when they register. That could be a combined 
list along with the queue ACLs. It's up to the AMs to enforce these. Maybe the 
RM proxy could do some of this as well. The MR JobHistoryServer gets ACLs from 
the AM - again it's up to this to enforce them. The RM AppHistoryServer will 
need to do the union though.

Don't have experience with JT ACLs, but it does look like that's doing a union 
as well. View vs Modify ACLs for queues makes sense to me.

 Get queue administration ACLs working
 -

 Key: YARN-899
 URL: https://issues.apache.org/jira/browse/YARN-899
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Xuan Gong
 Attachments: YARN-899.1.patch


 The Capacity Scheduler documents the 
 yarn.scheduler.capacity.root.queue-path.acl_administer_queue config option 
 for controlling who can administer a queue, but it is not hooked up to 
 anything.  The Fair Scheduler could make use of a similar option as well.  
 This is a feature-parity regression from MR1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1045) Improve toString implementation for PBImpls

2013-08-07 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-1045:


 Summary: Improve toString implementation for PBImpls
 Key: YARN-1045
 URL: https://issues.apache.org/jira/browse/YARN-1045
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth


The generic toString implementation that is used in most of the PBImpls 
{code}getProto().toString().replaceAll(\\n, , ).replaceAll(\\s+,  
);{code} is rather inefficient - replacing \n and \s to generate a one 
line string. Instead, we can use 
{code}TextFormat.shortDebugString(getProto());{code}.

If we can get this into 2.1.0 - great, otherwise the next release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-855) YarnClient.init should ensure that yarn parameters are present

2013-07-31 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725969#comment-13725969
 ] 

Siddharth Seth commented on YARN-855:
-

The simplest would be to check the configuration type - which keeps the API 
stable.

The reason I mentioned parameters is that apps that use YarnClient may have 
their own configuration type - e.g. JobConf or a HiveConf. Type information 
ends up getting lost even if these apps have created their configurations based 
on a YarnConfiguration.

 YarnClient.init should ensure that yarn parameters are present
 --

 Key: YARN-855
 URL: https://issues.apache.org/jira/browse/YARN-855
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siddharth Seth
Assignee: Abhishek Kapoor

 It currently accepts a Configuration object in init and doesn't check whether 
 it contains yarn parameters or is a YarnConfiguration. Should either accept 
 YarnConfiguration, check existence of parameters or create a 
 YarnConfiguration based on the configuration passed to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-07-30 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723968#comment-13723968
 ] 

Siddharth Seth commented on YARN-896:
-

bq. Robert Joseph Evans Applications may connect to other services such as 
HBase or issue tokens for communication between its own containers. All of 
these would require renewal.
The RM takes care of renewing tokens for HDFS - it can do this since the HDFS 
token renewer class is in the RM's classpath. For other applications - Hive for 
example - this isn't possible. I believe Hive ends up issuing tokens which are 
valid for a longer duration to get around the renewal problem. I won't 
necessarily link this to long running YARN though - other than the bit about 
the token max-age.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-710) Add to ser/deser methods to RecordFactory

2013-07-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698232#comment-13698232
 ] 

Siddharth Seth commented on YARN-710:
-

In the unit test, the setters on the ApplicationId aren't meant to be used 
(will end up throwing exceptions - this is replaced by newInstance in 
AppliactionId). Don't think getProto() needs to be changed at all in 
RecordFactoryPBImpl - instead a new getBuilder method should be sufficient. 
Somewhere along the flow, it looks like the default proto ends up being created 
- possibly linked to the getProto changes.

 Add to ser/deser methods to RecordFactory
 -

 Key: YARN-710
 URL: https://issues.apache.org/jira/browse/YARN-710
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-710.patch, YARN-710.patch, YARN-710-wip.patch


 I order to do things like AMs failover and checkpointing I need to serialize 
 app IDs, app attempt IDs, containers and/or IDs,  resource requests, etc.
 Because we are wrapping/hiding the PB implementation from the APIs, we are 
 hiding the built in PB ser/deser capabilities.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-855) YarnClient.init should ensure that yarn parameters are present

2013-06-19 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-855:
---

 Summary: YarnClient.init should ensure that yarn parameters are 
present
 Key: YARN-855
 URL: https://issues.apache.org/jira/browse/YARN-855
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siddharth Seth


It currently accepts a Configuration object in init and doesn't check whether 
it contains yarn parameters or is a YarnConfiguration. Should either accept 
YarnConfiguration, check existence of parameters or create a YarnConfiguration 
based on the configuration passed to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-848) Nodemanager does not register with RM using the fully qualified hostname

2013-06-18 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687248#comment-13687248
 ] 

Siddharth Seth commented on YARN-848:
-

+1. Simple enough patch. Hitesh, could you please rebase this on top of 
YARN-694 - which should go in soon.

 Nodemanager does not register with RM using the fully qualified hostname
 

 Key: YARN-848
 URL: https://issues.apache.org/jira/browse/YARN-848
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: YARN-848.1.patch


 If the hostname is misconfigured to not be fully qualified ( i.e. hostname 
 returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering 
 with the RM using only foo. This can create problems if DNS cannot resolve 
 the hostname properly. 
 Furthermore, HDFS uses fully qualified hostnames which can end up affecting 
 locality matches when allocating containers based on block locations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-848) Nodemanager does not register with RM using the fully qualified hostname

2013-06-18 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687388#comment-13687388
 ] 

Siddharth Seth commented on YARN-848:
-

+1. Thanks Hitesh. Committing this.

 Nodemanager does not register with RM using the fully qualified hostname
 

 Key: YARN-848
 URL: https://issues.apache.org/jira/browse/YARN-848
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: YARN-848.1.patch, YARN-848.3.patch


 If the hostname is misconfigured to not be fully qualified ( i.e. hostname 
 returns foo and hostname -f returns foo.bar.xyz ), the NM ends up registering 
 with the RM using only foo. This can create problems if DNS cannot resolve 
 the hostname properly. 
 Furthermore, HDFS uses fully qualified hostnames which can end up affecting 
 locality matches when allocating containers based on block locations. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-805) Fix yarn-api javadoc annotations

2013-06-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685106#comment-13685106
 ] 

Siddharth Seth commented on YARN-805:
-

Would be good if others take a look at the patch as well, since it documents 
long term API support.
- Do the allocate APIs need to be marked as Evolving, should be looked at by 
someone who's been tracking YARN-397 closely.
- Is this jira also supposed to add comments to the .proto files ?, or will 
that be a separate jira 
- Protocols vs Client libraries as the public interface.
- Some methods are annotated private, unstable - others just private, is there 
a reason for this ?
- Are the QueueInfo APIs stable ?
- GetAllApplicationRequest is being changed in YARN-727 - so the annotations 
may need to change depending on what happens on when that gets committed.
- The methods related to renew / cancellation of delegation tokens should 
continue to stay Private. The RM controls this for now.
- Annotations missing on StartContainerRequestPBImpl
- AMCommand - should this be marked as Evolving since additional commands may 
be added in the future. Similarly for NodeState. This likely needs to make it 
to the rolling upgrades jira as well - handling enumerations returned by method 
calls.
- ApplicationAttemptId, ApplicationId - appAttemptIdStrPrefix, appIdPrefix 
should be marked Private
- ApplicationReport.getCurrentApplicationAttemptId - should this be stable ?
- ApplicationReport.getOriginalTrackingUrl - private ? meant for proxy use only.
- ApplicationResourceUsageReport.getNumReservedContainers - not sure what 
numReservedContainers means in the context of multiple resources. Should 
probably be removed or marked private. Similarly NodeReport.getNumContainers
- ApplicationSubmissionContext.setPriority - don't think this is used by any 
scheduler yet. Should it be private for now ?
- ApplcationnSubmissionContext.setCancelTokensWhenComplete - evolving ?
- ApplicationSubmissionContext.setResource needs javadoc
- ContainerStatus.getExitStatus seems a little ambiguous. Evolving ?
- NodeId, AppId, AppAttemptId don't need annotations on their protected setters
- ResourceRequest/ResourceBlacklistRequest - update javadoc to say resource 
name instead of resource.
- YarnRuntimeException - I believe this is meant for internal exceptions within 
YARN. private/LimitedPrivate(mapreduce) since this leaks all over MR code.
Other
- Should RegisterApplicationMasterRequest have a hostname only newInstance(). 
All apps won't necessarily have an rpc port and tracking url.


 Fix yarn-api javadoc annotations
 

 Key: YARN-805
 URL: https://issues.apache.org/jira/browse/YARN-805
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-805.1.patch, YARN-805.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13685110#comment-13685110
 ] 

Siddharth Seth commented on YARN-802:
-

bq. If I understand correctly, in order to simultaneously support multiple jobs 
of multiple users that each can contact different Shuffle provider we must have 
all providers in the air in parallel.
Multiple providers can be run by the NodeManager in parallel. An application 
chooses which provider(s) it wants to use when it starts a container on a 
NodeManager.

bq. This data for this map arrive from the APPLICATION_INIT event. Hence, all 
AuxServices that serve as ShuffleProviders need to get APPLICATION_INIT events.
The data in the APPLICAITON_INIT event is from the startContainer request (the 
serviceData in the ContainerLaunchConetxt). If the application wants the INIT 
event to go to multiple providers, it can set the service data accordingly. The 
MapReduce AM hardcodes this to the default SHUFFLE_PROVIDER which is why only 
that one gets the init event.

There may be auxillary services which are not responsible for shuffle, or are 
in general incompatible with the shuffle consumer configured by the job. I 
don't think they need to get an INIT event.

 APPLICATION_INIT is never sent to AuxServices other than the builtin 
 ShuffleHandler
 ---

 Key: YARN-802
 URL: https://issues.apache.org/jira/browse/YARN-802
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch

 APPLICATION_INIT is never sent to AuxServices other than the built-in 
 ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
 able to function, because APPLICATION_INIT enables the AuxiliaryService to 
 map jobId-userId. This is needed for properly finding the MOFs of a job per 
 reducers' requests.
 NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
 hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
 explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
 ...) and ignores any additional AuxiliaryService. As a result, only the 
 built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
 AuxillaryService will never get APPLICATION_INIT events.
 I think a solution can be in one of two ways:
 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
 each of them, by calling serviceData.put (…) in loop.
 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
 APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
 Services regardless of the value in event.getServiceID().
 I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
 needed patch for any option that people like.
 See [Pluggable Shuffle in Hadoop 
 documentation|http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-841) Annotate and document AuxService APIs

2013-06-17 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-841:
---

 Summary: Annotate and document AuxService APIs
 Key: YARN-841
 URL: https://issues.apache.org/jira/browse/YARN-841
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siddharth Seth


For users writing their own AuxServices, these APIs should be annotated and 
need better documentation. Also, the classes may need to move out of the 
NodeManager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-844) Move PBImpls from yarn-api to yarn-common

2013-06-17 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-844:
---

 Summary: Move PBImpls from yarn-api to yarn-common
 Key: YARN-844
 URL: https://issues.apache.org/jira/browse/YARN-844
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN

2013-06-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686102#comment-13686102
 ] 

Siddharth Seth commented on YARN-666:
-

TBD - handling of Enum fields like AMCommand, NodeAction. This may be possible 
by forcing defaults if a new value needs to be added, alternately define a new 
Enum which is used by newer clients.

 [Umbrella] Support rolling upgrades in YARN
 ---

 Key: YARN-666
 URL: https://issues.apache.org/jira/browse/YARN-666
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Siddharth Seth
 Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf


 Jira to track changes required in YARN to allow rolling upgrades, including 
 documentation and possible upgrade routes. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-171) NodeManager should serve logs directly if log-aggregation is not enabled

2013-06-17 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-171:


Attachment: YARN-171_3.txt

Uploading a newer, but still very dated version of the patch - which I had 
sitting around on my system. In case someone is taking over this jira - this 
could be used as a starting point, or not.

 NodeManager should serve logs directly if log-aggregation is not enabled
 

 Key: YARN-171
 URL: https://issues.apache.org/jira/browse/YARN-171
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 0.23.3
Reporter: Vinod Kumar Vavilapalli
Assignee: Siddharth Seth
 Attachments: YARN-171_3.txt, YARN171_WIP.txt


 NodeManagers never serve logs for completed applications. If log-aggregation 
 is not enabled, in the interim, due to bugs like YARN-162, this is a serious 
 problem for users as logs are necessarily not available.
 We should let nodes serve logs directly if 
 YarnConfiguration.LOG_AGGREGATION_ENABLED is set. This should be okay as 
 NonAggregatingLogHandler can retain logs upto 
 YarnConfiguration.NM_LOG_RETAIN_SECONDS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-805) Fix yarn-api javadoc annotations

2013-06-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686284#comment-13686284
 ] 

Siddharth Seth commented on YARN-805:
-

Thanks for the updated patch Jian. Couple more comments.
- The latest patch changes *ProtocolPB to Public. These should remain private 
- implementation detail.
- ApplicationSubmissionContext.setResource still needs javadoc. The patch added 
it to getResource.
- RegisterApplicationMaster.newInstance - looking at this again, I think it's 
better to add another newInstance method. The current documentation says set 
host to an empty string - that's not really correct since the RM won't set it 
either. Also, instead of empty string - the defaults should be null. If we add 
a new newInstance method - we can control the default values. Follow up jira - 
YARN should figure out the hostname instead of expecting it in the Register 
call (may not be possible for unmanaged AM).

Otherwise, the changes look good.

 Fix yarn-api javadoc annotations
 

 Key: YARN-805
 URL: https://issues.apache.org/jira/browse/YARN-805
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-805.1.patch, YARN-805.2.patch, YARN-805.3.patch, 
 YARN-805.4.patch, YARN-805.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-841) Annotate and document AuxService APIs

2013-06-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686311#comment-13686311
 ] 

Siddharth Seth commented on YARN-841:
-

Looks good. Needs a small change to the javadoc though - initApplication is 
only sent to the AuxService specified by the application. We should probably do 
the same for the stopApplication as well.

 Annotate and document AuxService APIs
 -

 Key: YARN-841
 URL: https://issues.apache.org/jira/browse/YARN-841
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.5-alpha
Reporter: Siddharth Seth
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-841-20130617.1.txt, YARN-841-20130617.2.txt, 
 YARN-841-20130617.txt


 For users writing their own AuxServices, these APIs should be annotated and 
 need better documentation. Also, the classes may need to move out of the 
 NodeManager.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-805) Fix yarn-api javadoc annotations

2013-06-17 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13686335#comment-13686335
 ] 

Siddharth Seth commented on YARN-805:
-

+1. This looks good. Thanks Jian.

 Fix yarn-api javadoc annotations
 

 Key: YARN-805
 URL: https://issues.apache.org/jira/browse/YARN-805
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
Priority: Blocker
 Attachments: YARN-805.1.patch, YARN-805.2.patch, YARN-805.3.patch, 
 YARN-805.4.patch, YARN-805.5.patch, YARN-805.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (YARN-844) Move PBImpls from yarn-api to yarn-common

2013-06-17 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth resolved YARN-844.
-

Resolution: Duplicate

 Move PBImpls from yarn-api to yarn-common
 -

 Key: YARN-844
 URL: https://issues.apache.org/jira/browse/YARN-844
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siddharth Seth



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-825) Fix yarn-common javadoc annotations

2013-06-16 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684883#comment-13684883
 ] 

Siddharth Seth commented on YARN-825:
-

bq. Overall, I felt that it isn't enough to just mark the package as private 
where applicable, so I went ahead and added annotations for individual classes. 
Agree.

A lot of the public classes need Javadoc. I think a follow-up jira can be used 
for this, which shouldn't block 2.1.0 (assuming 2.1.1 will be soon after). 
Also, there's a bunch of non-annotated classes in yarn-api as well - 
YarnException, YarnRuntimeException, YarnConfiguration, RecordFactory* being 
some of the important ones. Separate jira for this as well I think. Unrelated, 
should the PBImpls be moved from yarn-api to yarn-common (They're private 
anyway).

Some stuff which may need to be changed
- AggregatedLogsDeletionService - Private to LimitedPrivate. Used in the MR 
history server since a Yarn log/app history server does not exist. I don't mind 
leaving this as Private as well though - since it's use in MR is temporary.
- Should ClientToAMTokenSecretManager be final, or do you think there's use 
cases where users may want to extend this.
- Should ServiceStateModel be private
- ApplicaionClassLoader - leave as Unstable ?
- Until Apps, ConverterUtils etc are cleaned up - mark them as private ? 
Apps.addToEnvironment should be public though.
- ResourceCalculatorPlugin and related classes - public Unstable or 
LimitedPrivate. This is already used in MapReduce
- Similarly for RackResolver
- Unrelated, should ApplcaitionTokenIdentifer be renamed to something like 
AMTokenIdentifier ?


 Fix yarn-common javadoc annotations
 ---

 Key: YARN-825
 URL: https://issues.apache.org/jira/browse/YARN-825
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: YARN-825-20130615.1.txt, YARN-825-20130615.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683579#comment-13683579
 ] 

Siddharth Seth commented on YARN-802:
-

With YARN, a new AM (Application) is started per job. The initApp in the NM is 
per app - so each job/app can choose which shuffle provider it wants to use. 
The shuffle service configured for an AM will be specific to a single job only.
From MAPREDUCE-4049
bq.  A shuffle consumer instance will only contact one of the shuffle providers 
and will request its desired files only from from this provider.

I'm assuming a single job will only use one shuffle provider - or do you see a 
situation where multiple shuffle providers can serve data to a single job ?

In case of multiple jobs being run by a single AM - this gets more complicated, 
and we may need to initialize multiple providers.

 APPLICATION_INIT is never sent to AuxServices other than the builtin 
 ShuffleHandler
 ---

 Key: YARN-802
 URL: https://issues.apache.org/jira/browse/YARN-802
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch

 APPLICATION_INIT is never sent to AuxServices other than the built-in 
 ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
 able to function, because APPLICATION_INIT enables the AuxiliaryService to 
 map jobId-userId. This is needed for properly finding the MOFs of a job per 
 reducers' requests.
 NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
 hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
 explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
 ...) and ignores any additional AuxiliaryService. As a result, only the 
 built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
 AuxillaryService will never get APPLICATION_INIT events.
 I think a solution can be in one of two ways:
 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
 each of them, by calling serviceData.put (…) in loop.
 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
 APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
 Services regardless of the value in event.getServiceID().
 I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
 needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-804) mark AbstractService init/start/stop methods as final

2013-06-14 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13683584#comment-13683584
 ] 

Siddharth Seth commented on YARN-804:
-

Are the same semantics as AbstractService enforced for the CompositeService as 
well ? Are users expected to call super.init / super.serviceInit to take care 
of all the Services which are part of the composite service, or will 
CompositeService just take of this ? Otherwise, it may make sense to re-open 
YARN-811.

 mark AbstractService init/start/stop methods as final
 -

 Key: YARN-804
 URL: https://issues.apache.org/jira/browse/YARN-804
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.1.0-beta
Reporter: Steve Loughran
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-804-001.patch


 Now that YARN-117 and MAPREDUCE-5298 are checked in, we can mark the public 
 AbstractService init/start/stop methods as final.
 Why? It puts the lifecycle check and error handling around the subclass code, 
 ensuring no lifecycle method gets called in the wrong state or gets called 
 more than once.When a {{serviceInit(), serviceStart()   serviceStop()}} 
 method throws an exception, it's caught and auto-triggers stop. 
 Marking the methods as final forces service implementations to move to the 
 stricter lifecycle. It has one side effect: some of the mocking tests play up 
 -I'll need some assistance here

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682488#comment-13682488
 ] 

Siddharth Seth commented on YARN-802:
-

Can the MR AM specify the service to be used via configuration, and set the 
service data accordingly.

 APPLICATION_INIT is never sent to AuxServices other than the builtin 
 ShuffleHandler
 ---

 Key: YARN-802
 URL: https://issues.apache.org/jira/browse/YARN-802
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch

 APPLICATION_INIT is never sent to AuxServices other than the built-in 
 ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
 able to function, because APPLICATION_INIT enables the AuxiliaryService to 
 map jobId-userId. This is needed for properly finding the MOFs of a job per 
 reducers' requests.
 NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
 hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
 explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
 ...) and ignores any additional AuxiliaryService. As a result, only the 
 built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
 AuxillaryService will never get APPLICATION_INIT events.
 I think a solution can be in one of two ways:
 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
 each of them, by calling serviceData.put (…) in loop.
 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
 APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
 Services regardless of the value in event.getServiceID().
 I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
 needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-802) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-06-13 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13682724#comment-13682724
 ] 

Siddharth Seth commented on YARN-802:
-

It should be possible to configure the MR AM with the shuffle service that 
needs to be used, in which case the MR AM sets up the service id correctly (in 
TaskAttemptImpl), and the NodeManager can send the init event to the correct 
service. We should probably change the stop to behave the same way. 

 APPLICATION_INIT is never sent to AuxServices other than the builtin 
 ShuffleHandler
 ---

 Key: YARN-802
 URL: https://issues.apache.org/jira/browse/YARN-802
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications, nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Avner BenHanoch

 APPLICATION_INIT is never sent to AuxServices other than the built-in 
 ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
 able to function, because APPLICATION_INIT enables the AuxiliaryService to 
 map jobId-userId. This is needed for properly finding the MOFs of a job per 
 reducers' requests.
 NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
 hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
 explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
 ...) and ignores any additional AuxiliaryService. As a result, only the 
 built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
 AuxillaryService will never get APPLICATION_INIT events.
 I think a solution can be in one of two ways:
 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
 each of them, by calling serviceData.put (…) in loop.
 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
 APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
 Services regardless of the value in event.getServiceID().
 I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
 needed patch for any option that people like.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-811) Add a set of final _init/_start/_stop methods to CompositeService

2013-06-13 Thread Siddharth Seth (JIRA)
Siddharth Seth created YARN-811:
---

 Summary: Add a set of final _init/_start/_stop methods to 
CompositeService
 Key: YARN-811
 URL: https://issues.apache.org/jira/browse/YARN-811
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Siddharth Seth


Classes which implement AbstractService no longer need to make a super.init, 
start, stop call. The same could be done for CompositeService as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   >