[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943703#comment-14943703
 ] 

Sangjin Lee commented on YARN-4178:
---

+1 with consolidating WriterUtils and ReaderUtils.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4220) [Storage implementation] Support getEntities with only Application id but no flow and flow run ID

2015-10-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943741#comment-14943741
 ] 

Li Lu commented on YARN-4220:
-

Oh Thanks [~varun_saxena]. Having looked at the code I can see we're checking 
the preconditions in ApplicationEntityReader. We're obtaining container level 
entities from the generic reader but application level entities from the 
application entity reader. I think this caused the problem. I thought this was 
a feature but it turned out this looks like a bug :). 

> [Storage implementation] Support getEntities with only Application id but no 
> flow and flow run ID
> -
>
> Key: YARN-4220
> URL: https://issues.apache.org/jira/browse/YARN-4220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Minor
>
> Currently we're enforcing flow and flowrun id to be non-null values on 
> {{getEntities}}. We can actually query the appToFlow table to figure out an 
> application's flow id and flowrun id if they're missing. This will simplify 
> normal queries. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943702#comment-14943702
 ] 

Varun Saxena commented on YARN-3864:


[~sjlee0], thanks for the review. Will address your comments and update a patch 
shortly.

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: YARN-3864-YARN-2928.01.patch, 
> YARN-3864-YARN-2928.02.patch, YARN-3864-YARN-2928.03.patch, 
> YARN-3864-addendum-appaggregation.patch
>
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4220) [Storage implementation] Support getEntities with only Application id but no flow and flow run ID

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943700#comment-14943700
 ] 

Sangjin Lee commented on YARN-4220:
---

Hmm, for any application or generic entity queries, we do not require the flow 
or the flow run id (or with YARN-4221 even user id). They are already populated 
from the app-to-flow table. Is that what you're referring to? Then it should 
already be working that way.

> [Storage implementation] Support getEntities with only Application id but no 
> flow and flow run ID
> -
>
> Key: YARN-4220
> URL: https://issues.apache.org/jira/browse/YARN-4220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Minor
>
> Currently we're enforcing flow and flowrun id to be non-null values on 
> {{getEntities}}. We can actually query the appToFlow table to figure out an 
> application's flow id and flowrun id if they're missing. This will simplify 
> normal queries. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943760#comment-14943760
 ] 

Sangjin Lee commented on YARN-3367:
---

Sorry for my late reply.

{quote}
Timelineclient async calls are only to ensure the client need not wait till the 
server response & just return immediately after requesting to post entity or 
even in server side we need to ensure some thing ? As currently we are trying 
to send the async parameter to the server.
{quote}

I think at minimum the flush should not be done on the server side. If the 
client is fine without the server response, it clearly implies flush is not 
needed (we had this discussion on another JIRA).

{quote}
Is it important to maintain the order of events which are sent from sync and 
async ? i.e. Is it req to ensure all the async events are also pushed along 
with the current sync event or is it ok to send only the sync ? (current patch 
just ensures async events are in order) .
{quote}

I'm not sure if it is a requirement that the timeline *client* has to ensure 
the order of events for both sync and async. First of all, the timestamp should 
be set for most of the entities, metrics, events, etc., and the server should 
rely on the timestamps to resolve ordering. Also, even if the client ensures a 
certain order, there are many situations under which the events will be 
received by the server out of order.

{quote}
Whether its req to merge entities of multiple async calls as they belong to 
same application ?
{quote}

I'm not really sure if this is something the client needs to do. If anything, 
this requirement should fall on the app level timeline collector. I don't see a 
whole lot of situations where the timeline client can do this easily and 
unambiguously. Thoughts, [~Naganarasimha]?

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
> Attachments: YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943771#comment-14943771
 ] 

Sangjin Lee commented on YARN-3864:
---

The latest patch (v.4) LGTM. Unless there are additional comments, and with 
jenkins passing, I'll commit it soon.

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: YARN-3864-YARN-2928.01.patch, 
> YARN-3864-YARN-2928.02.patch, YARN-3864-YARN-2928.03.patch, 
> YARN-3864-YARN-2928.04.patch, YARN-3864-addendum-appaggregation.patch
>
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4220) [Storage implementation] Support getEntities with only Application id but no flow and flow run ID

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943773#comment-14943773
 ] 

Varun Saxena commented on YARN-4220:


This wont work even after YARN-3864. /entities endpoint will not be considered 
as single entity read but that is what we are trying to do here. Read a single 
application entity.

Maybe we can handle it here because passing query params gets us correct result.

> [Storage implementation] Support getEntities with only Application id but no 
> flow and flow run ID
> -
>
> Key: YARN-4220
> URL: https://issues.apache.org/jira/browse/YARN-4220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Minor
>
> Currently we're enforcing flow and flowrun id to be non-null values on 
> {{getEntities}}. We can actually query the appToFlow table to figure out an 
> application's flow id and flowrun id if they're missing. This will simplify 
> normal queries. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943778#comment-14943778
 ] 

Vrushali C commented on YARN-4178:
--

bq. I think we can have a single class TimelineStorageUtils and remove 
TimelineWriterUtils and TimelineReaderUtils.
Sounds good. I guess it will be a big class but shouldn't be a problem. 

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4216) Container logs not shown for newly assigned containers after NM recovery

2015-10-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943781#comment-14943781
 ] 

Jason Lowe commented on YARN-4216:
--

The container logs should not be uploaded on NM stop if we are doing recovery.  
That is intentional.  Decommission + nm restart doesn't make sense to me.  
Either we are decommissioning a node and don't expect it to return, or we are 
going to restart it and expect it to return shortly.  For the former, we want 
the NM to linger a bit to try to finish log aggregation.  For the latter it 
should not.

If we are decommissioning the node then context.getDecommissioned() in the 
boolean clause above should be true which means shouldAbort would be false.  
That means it should not do the same thing as a shutdown under supervision.  My 
apologies if I'm missing something.

> Container logs not shown for newly assigned containers  after NM  recovery
> --
>
> Key: YARN-4216
> URL: https://issues.apache.org/jira/browse/YARN-4216
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: NMLog, ScreenshotFolder.png, yarn-site.xml
>
>
> Steps to reproduce
> # Start 2 nodemanagers  with NM recovery enabled
> # Submit pi job with 20 maps 
> # Once 5 maps gets completed in NM 1 stop NM (yarn daemon stop nodemanager)
> (Logs of all completed container gets aggregated to HDFS)
> # Now start  the NM1 again and wait for job completion
> *The newly assigned container logs on NM1 are not shown*
> *hdfs log dir state*
> # When logs are aggregated to HDFS during stop its with NAME (localhost_38153)
> # On log aggregation after starting NM the newly assigned container logs gets 
> uploaded with name  (localhost_38153.tmp) 
> History server the logs are now shown for new task attempts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4220) [Storage implementation] Support getEntities with only Application id but no flow and flow run ID

2015-10-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943721#comment-14943721
 ] 

Li Lu commented on YARN-4220:
-

Well this may be the thing I'm missing now: I made a call 
{{http://localhost:8188/ws/v2/timeline/entities/application_1443660447597_0001/YARN_APPLICATION?userid=llu=ALL}}
 and get exceptions {{java.lang.NullPointerException: flowId shouldn't be 
null}}. I believe we do have the ability to handle this kind of requests, but 
for some reason we're blocking this in some intermediate levels. A call like 
{{http://localhost:8188/ws/v2/timeline/entities/application_1443660447597_0001/YARN_APPLICATION?userid=llu=ALL=flow_1443660447597_1=1}}
 would return the correct info though. Am I calling it in the wrong way? 
Thanks! 

> [Storage implementation] Support getEntities with only Application id but no 
> flow and flow run ID
> -
>
> Key: YARN-4220
> URL: https://issues.apache.org/jira/browse/YARN-4220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Minor
>
> Currently we're enforcing flow and flowrun id to be non-null values on 
> {{getEntities}}. We can actually query the appToFlow table to figure out an 
> application's flow id and flowrun id if they're missing. This will simplify 
> normal queries. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-05 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4162:

Attachment: YARN-4162.v2.002.patch

Hi [~wangda],
I have updated the patch with test cases and did some sufficient testing with 
and without node labels for the UI, seems to be fine, hope you could also try 
once. 
Also to ensure all the REST data related to the existing web UI is available we 
 need to also expose the Node Label Resource information. Do i need to add them 
also in this jira ? 

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4183) Enabling generic application history forces every job to get a timeline service delegation token

2015-10-05 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943832#comment-14943832
 ] 

Jonathan Eagles commented on YARN-4183:
---

[~xgong], this issue is two-fold. 1) The web services publishing should trigger 
posting history based on generic history enablement and not timeline server 
enablement. 2) There is still no separation between timeline clients that 
require delegation tokens and those that don't. See YARN-3942. As a result, if 
timelineservice is enabled at the global level, then each yarn client will get 
a timeline delegation token which makes the timeline service a live dependency. 
Meaning if the timeline service is down, then the grid is down.

This patch above is a clean way to avoid enabling the timeline service for all 
YarnClients in the cluster.

> Enabling generic application history forces every job to get a timeline 
> service delegation token
> 
>
> Key: YARN-4183
> URL: https://issues.apache.org/jira/browse/YARN-4183
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-4183.1.patch
>
>
> When enabling just the Generic History Server and not the timeline server, 
> the system metrics publisher will not publish the events to the timeline 
> store as it checks if the timeline server and system metrics publisher are 
> enabled before creating a timeline client.
> To make it work, if the timeline service flag is turned on, it will force 
> every yarn application to get a delegation token.
> Instead of checking if timeline service is enabled, we should be checking if 
> application history server is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4178:
---
Attachment: YARN-4178-YARN-2928.03.patch

Updating a patch after merging Writer and Reader utils

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-10-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3864:
---
Attachment: YARN-3864-YARN-2928.04.patch

Updating a patch addressing [~sjlee0]'s comments

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: YARN-3864-YARN-2928.01.patch, 
> YARN-3864-YARN-2928.02.patch, YARN-3864-YARN-2928.03.patch, 
> YARN-3864-YARN-2928.04.patch, YARN-3864-addendum-appaggregation.patch
>
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4220) [Storage implementation] Support getEntities with only Application id but no flow and flow run ID

2015-10-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943785#comment-14943785
 ] 

Li Lu commented on YARN-4220:
-

Sure. We can fix it here. Right now I'm passing flow and flowrun ids in the web 
UI POC. Not a big deal. 

> [Storage implementation] Support getEntities with only Application id but no 
> flow and flow run ID
> -
>
> Key: YARN-4220
> URL: https://issues.apache.org/jira/browse/YARN-4220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Minor
>
> Currently we're enforcing flow and flowrun id to be non-null values on 
> {{getEntities}}. We can actually query the appToFlow table to figure out an 
> application's flow id and flowrun id if they're missing. This will simplify 
> normal queries. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943791#comment-14943791
 ] 

Hadoop QA commented on YARN-3864:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m  9s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  5s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 17s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  7s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 53s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m 57s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  41m  3s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765030/YARN-3864-YARN-2928.04.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / a95b8f5 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9350/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9350/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9350/console |


This message was automatically generated.

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: YARN-3864-YARN-2928.01.patch, 
> YARN-3864-YARN-2928.02.patch, YARN-3864-YARN-2928.03.patch, 
> YARN-3864-YARN-2928.04.patch, YARN-3864-addendum-appaggregation.patch
>
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-10-05 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943809#comment-14943809
 ] 

Anubhav Dhoot commented on YARN-4185:
-

I don't think option 2 where you restart from 1 makes sense. Its also not a 
goal to minimize the total wait time. The goal should be to minimize the time 
to recover for short intermittent failure while also waiting long enough for 
long failures before giving up. Would it be better for us to ramp up to 10 sec 
exponentially and then do the n retries for 10 sec or do totally n retries 
including the ramp up.

> Retry interval delay for NM client can be improved from the fixed static 
> retry 
> ---
>
> Key: YARN-4185
> URL: https://issues.apache.org/jira/browse/YARN-4185
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>
> Instead of having a fixed retry interval that starts off very high and stays 
> there, we are better off using an exponential backoff that has the same fixed 
> max limit. Today the retry interval is fixed at 10 sec that can be 
> unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-10-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943849#comment-14943849
 ] 

Naganarasimha G R commented on YARN-3367:
-

Thanks for the feed back [~sjlee0],
bq. If the client is fine without the server response, it clearly implies flush 
is not needed 
Yes i agree with this, but what should be the behavior of Sync calls ? IMO in 
the wake of YARN-4061  (Fault tolerant writer for timeline v2), we need not 
worry abt it either, Thoughts ?
bq. First of all, the timestamp should be set for most of the entities, 
metrics, events, etc., and the server should rely on the timestamps to resolve 
ordering. 
Well i can understand if all the events are received and timestamp is filled at 
the client side we need not worry abt the order but in the case client goes 
down send some events out of order ? like containerFinished event gets 
Published but logAggregation does not succeed. And from Client App side not 
sure how important is the order to be maintained.

bq.   I don't see a whole lot of situations where the timeline client can do 
this easily and unambiguously. 
Well the approach i thought is to block publishing further entities/events 
through sync /async calls till the events in Timeline client queue is cleared. 
But i don't completely see the need for them until its very necessary to 
maintain the order. [~djp] can you please comment on this part as in this jira 
description you have targetted to get the events in order.





> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
> Attachments: YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4170) AM need to be notified with priority in AllocateResponse

2015-10-05 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4170:
--
Attachment: 0003-YARN-4170.patch

Yes [~rohithsharma]
Thank you for sharing the thoughts. Yes, we could send "null" when there are no 
changes in priority. Updated patch against this change.

> AM need to be notified with priority in AllocateResponse 
> -
>
> Key: YARN-4170
> URL: https://issues.apache.org/jira/browse/YARN-4170
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4170.patch, 0002-YARN-4170.patch, 
> 0003-YARN-4170.patch
>
>
> As discussed in MAPREDUCE-5870, Application Master need to be notified with 
> priority in Allocate heartbeat.  This will help AM to know the priority and 
> can update JobStatus when client asks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4220) [Storage implementation] Support getEntities with only Application id but no flow and flow run ID

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943729#comment-14943729
 ] 

Varun Saxena commented on YARN-4220:


[~gtCarrera9]
Oh I get it. Even I was wondering what this JIRA is about.
I think this will be addressed by patches in YARN-3864. There were some gaps in 
ApplicationEntityReader which I have fixed while working on YARN-3864.
Will check this flow and confirm

> [Storage implementation] Support getEntities with only Application id but no 
> flow and flow run ID
> -
>
> Key: YARN-4220
> URL: https://issues.apache.org/jira/browse/YARN-4220
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Minor
>
> Currently we're enforcing flow and flowrun id to be non-null values on 
> {{getEntities}}. We can actually query the appToFlow table to figure out an 
> application's flow id and flowrun id if they're missing. This will simplify 
> normal queries. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the REST interface to conform to current REST APIs' in YARN

2015-10-05 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943751#comment-14943751
 ] 

Allen Wittenauer commented on YARN-4224:


If this is an incompatible change, then this needs to be ws/v3.

> Change the REST interface to conform to current REST APIs' in YARN
> --
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4170) AM need to be notified with priority in AllocateResponse

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943797#comment-14943797
 ] 

Hadoop QA commented on YARN-4170:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 14s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 34s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 59s | The applied patch generated  1 
new checkstyle issues (total was 7, now 8). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 57s | Post-patch findbugs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common compilation is broken. |
| {color:red}-1{color} | findbugs |   2m 20s | Post-patch findbugs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 compilation is broken. |
| {color:green}+1{color} | findbugs |   2m 20s | The patch does not introduce 
any new Findbugs (version ) warnings. |
| {color:red}-1{color} | yarn tests |   0m 24s | Tests failed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   0m 23s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   0m 22s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  47m 37s | |
\\
\\
|| Reason || Tests ||
| Failed build | hadoop-yarn-api |
|   | hadoop-yarn-common |
|   | hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765029/0003-YARN-4170.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b925cf1 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9349/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9349/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9349/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9349/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9349/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9349/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9349/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9349/console |


This message was automatically generated.

> AM need to be notified with priority in AllocateResponse 
> -
>
> Key: YARN-4170
> URL: https://issues.apache.org/jira/browse/YARN-4170
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4170.patch, 0002-YARN-4170.patch, 
> 0003-YARN-4170.patch
>
>
> As discussed in MAPREDUCE-5870, Application Master need to be notified with 
> priority in Allocate heartbeat.  This will help AM to know the priority and 
> can update JobStatus when client asks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943863#comment-14943863
 ] 

Varun Saxena commented on YARN-4178:


Depending on order in which patches go in, YARN-4178 or YARN-3864 will require 
rebase

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-05 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4162:

Attachment: YARN-4162.v2.003.patch

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, YARN-4162.v2.003.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-05 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944501#comment-14944501
 ] 

Rohith Sharma K S commented on YARN-4209:
-

patch apply for branch-2.7.2 is failing.. Can you provide patch for branch-2.7?

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-10-05 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944498#comment-14944498
 ] 

Rohith Sharma K S commented on YARN-4209:
-

committing shortly

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node

2015-10-05 Thread Wilfred Spiegelenburg (JIRA)
Wilfred Spiegelenburg created YARN-4227:
---

 Summary: FairScheduler: RM quits processing expired container from 
a removed node
 Key: YARN-4227
 URL: https://issues.apache.org/jira/browse/YARN-4227
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.1, 2.5.0, 2.3.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
Priority: Critical


Under some circumstances the node is removed before an expired container event 
is processed causing the RM to exit:
{code}
2015-10-04 21:14:01,063 INFO 
org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
Expired:container_1436927988321_1307950_01_12 Timed out after 600 secs
2015-10-04 21:14:01,063 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_1436927988321_1307950_01_12 Container Transitioned from ACQUIRED 
to EXPIRED
2015-10-04 21:14:01,063 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: 
Completed container: container_1436927988321_1307950_01_12 in state: 
EXPIRED event:EXPIRE
2015-10-04 21:14:01,063 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=system_op 
   OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
APPID=application_1436927988321_1307950 
CONTAINERID=container_1436927988321_1307950_01_12
2015-10-04 21:14:01,063 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type CONTAINER_EXPIRED to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:849)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1273)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:585)
at java.lang.Thread.run(Thread.java:745)
2015-10-04 21:14:01,063 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}
The stack trace is from 2.3.0 but the same issue has been observed in 2.5.0 and 
2.6.0 by different customers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-05 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4215:

Attachment: YARN-4215.v1.002.patch

> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Attachments: YARN-4215.v1.001.patch, YARN-4215.v1.002.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM periodically for distributed nodelabels

2015-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944544#comment-14944544
 ] 

Hudson commented on YARN-4176:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #458 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/458/])
YARN-4176. Resync NM nodelabels with RM periodically for distributed (wangda: 
rev 30ac69c6bd3db363248d6c742561371576006dab)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Resync NM nodelabels with RM periodically for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch, 0004-YARN-4176.patch, 0005-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4228) FileSystemRMStateStore use IOUtils on fs#close

2015-10-05 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-4228:
--

 Summary: FileSystemRMStateStore use IOUtils on fs#close
 Key: YARN-4228
 URL: https://issues.apache.org/jira/browse/YARN-4228
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Minor


NPE on {{FileSystemRMStateStore#closeWithRetries}} when active service 
initialization fails on rm start up

{noformat}
2015-10-05 19:56:38,626 INFO org.apache.hadoop.service.AbstractService: Service 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore failed in 
state STOPPED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:721)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$13.run(FileSystemRMStateStore.java:718)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore$FSAction.runWithRetries(FileSystemRMStateStore.java:734)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeWithRetries(FileSystemRMStateStore.java:718)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.closeInternal(FileSystemRMStateStore.java:169)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceStop(RMStateStore.java:618)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at org.apache.hadoop.service.AbstractService.close(AbstractService.java:250)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:609)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:965)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:256)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1195)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-05 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944484#comment-14944484
 ] 

Naganarasimha G R commented on YARN-4162:
-

Thanks [~wangda] for the comments, have incorporated them in the latest patch. 
Test case failures seems to be not related to the modifications in this jira 
and valid checkstyle comments have been incorporated.
Also your thoughts on my earlier comment ?
bq. Also to ensure all the REST data related to the existing web UI is 
available we need to also expose the Node Label Resource information. Do i need 
to add them also in this jira ?




> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, YARN-4162.v2.003.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4227) FairScheduler: RM quits processing expired container from a removed node

2015-10-05 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-4227:

Attachment: YARN-4227.patch

> FairScheduler: RM quits processing expired container from a removed node
> 
>
> Key: YARN-4227
> URL: https://issues.apache.org/jira/browse/YARN-4227
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.3.0, 2.5.0, 2.7.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
> Attachments: YARN-4227.patch
>
>
> Under some circumstances the node is removed before an expired container 
> event is processed causing the RM to exit:
> {code}
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: 
> Expired:container_1436927988321_1307950_01_12 Timed out after 600 secs
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1436927988321_1307950_01_12 Container Transitioned from 
> ACQUIRED to EXPIRED
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerApp: 
> Completed container: container_1436927988321_1307950_01_12 in state: 
> EXPIRED event:EXPIRE
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=system_op   
>OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS  
> APPID=application_1436927988321_1307950 
> CONTAINERID=container_1436927988321_1307950_01_12
> 2015-10-04 21:14:01,063 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type CONTAINER_EXPIRED to the scheduler
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.completedContainer(FairScheduler.java:849)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1273)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:585)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-10-04 21:14:01,063 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> The stack trace is from 2.3.0 but the same issue has been observed in 2.5.0 
> and 2.6.0 by different customers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4216) Container logs not shown for newly assigned containers after NM recovery

2015-10-05 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944524#comment-14944524
 ] 

Bibin A Chundatt commented on YARN-4216:


{quote}
 That is intentional. Decommission + nm restart doesn't make sense to me. 
Either we are decommissioning a node and don't expect it to return, or we are 
going to restart it and expect it to return shortly.
{quote}
For *rolling upgrade* the same scenarios can happen *( decommmision (logs 
upload) --> upgrade --> start NM --> new container assignment --> on finish log 
upload )* and container log loss happens. Append logs during aggregation could 
be one solution in this case rt?

> Container logs not shown for newly assigned containers  after NM  recovery
> --
>
> Key: YARN-4216
> URL: https://issues.apache.org/jira/browse/YARN-4216
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: NMLog, ScreenshotFolder.png, yarn-site.xml
>
>
> Steps to reproduce
> # Start 2 nodemanagers  with NM recovery enabled
> # Submit pi job with 20 maps 
> # Once 5 maps gets completed in NM 1 stop NM (yarn daemon stop nodemanager)
> (Logs of all completed container gets aggregated to HDFS)
> # Now start  the NM1 again and wait for job completion
> *The newly assigned container logs on NM1 are not shown*
> *hdfs log dir state*
> # When logs are aggregated to HDFS during stop its with NAME (localhost_38153)
> # On log aggregation after starting NM the newly assigned container logs gets 
> uploaded with name  (localhost_38153.tmp) 
> History server the logs are now shown for new task attempts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943999#comment-14943999
 ] 

Sangjin Lee commented on YARN-4061:
---

Sorry it took me a while to get to this. Thanks for the proposal [~gtCarrera9]!

One potential area of concern is regarding flush and the associated contract 
with the client. If the client wanted to write critical data synchronously and 
specifically wanted to block until it receives a server (storage) response, how 
would that work in this scheme? The proposal has the following for the flush 
situation:
{quote}
When one log segment reaches a predefined size, or a time­ trigger, or an 
explicit flush call happens, it is published to a log queue.
{quote}

Since the actual storage writer (HBase) always acts on this queue 
asynchronously, it seems that the client cannot have a synchronous write 
semantics. Is that a correct reading? If so, how would we implement such a 
synchronous write?

> [Fault tolerance] Fault tolerant writer for timeline v2
> ---
>
> Key: YARN-4061
> URL: https://issues.apache.org/jira/browse/YARN-4061
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944009#comment-14944009
 ] 

Sangjin Lee commented on YARN-4061:
---

Another thing to consider is also the throughput of writing to filesystems 
(local or hdfs). This may or may not be a big problem for app-level timeline 
collector, but it would certainly be something we need to analyze rigorously 
for the RM timeline collector. If we go the route of writing all writes to 
disk, then we should ensure that we can sustain the throughput for the RM 
collector of a very large cluster (> 10,000 nodes, a large number of apps being 
created).

> [Fault tolerance] Fault tolerant writer for timeline v2
> ---
>
> Key: YARN-4061
> URL: https://issues.apache.org/jira/browse/YARN-4061
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944028#comment-14944028
 ] 

Sangjin Lee commented on YARN-3367:
---

bq. Yes i agree with this, but what should be the behavior of Sync calls ? IMO 
in the wake of YARN-4061 (Fault tolerant writer for timeline v2), we need not 
worry abt it either, Thoughts ?

I added a couple of comments to YARN-4061. I think it remains to be seen what 
we will choose as the behavior/implementation at the end. But at least I think 
it'd be fair to say that there will be a certain type of calls that will need 
to trigger flush (and a synchronous wait for the response). Whether we will do 
that on the sync side or not, I think we have some flexibility.

{quote}
Well i can understand if all the events are received and timestamp is filled at 
the client side we need not worry abt the order but in the case client goes 
down send some events out of order ? like containerFinished event gets 
Published but logAggregation does not succeed. And from Client App side not 
sure how important is the order to be maintained.
{quote}

I think we might be saying the same thing. What I'm saying is that it would not 
be practical to ensure the order of events for sync and async writes. As for 
the timestamps, I am also arguing that the timestamps should always be set 
explicitly for entities/metrics/events, and that the server should rely on the 
explicit timestamps, rather than on time of receipt.


> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
> Attachments: YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943918#comment-14943918
 ] 

Varun Saxena commented on YARN-2902:


The new patch does the following over the last patch :
# Removed additional param in container executor to ignore missing directory. 
Now this will be the default behaviour
# The patch no longer cancels the deletion task in NM
# Localizer wont send the extra heartbeat which it was sending to NM if NM had 
indicated it to DIE.
# Do not wait in localizer for cancelled task to complete on DIE.

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943957#comment-14943957
 ] 

Sangjin Lee commented on YARN-4178:
---

I just committed YARN-3864. Could you please rebase this patch? Thanks.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943973#comment-14943973
 ] 

Hadoop QA commented on YARN-4178:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 29s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   9m 14s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 18s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  5s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 44s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 45s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  2s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m 53s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  46m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765047/YARN-4178-YARN-2928.03.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / a95b8f5 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9352/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9352/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9352/console |


This message was automatically generated.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4175) Example of use YARN-1197

2015-10-05 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944039#comment-14944039
 ] 

MENG DING commented on YARN-4175:
-

I am using the example application to test the container increase/decrease 
function against a 4 node cluster. Will collect and report all problems when 
the tests are completed.

Just a quick note in case someone also wants to do the test:
* The application master IPC server now listens on a fixed port 8686. If 
multiple app masters are started on the same host with *-enable_ipc* option 
specified, there will be port conflicts, but YARN should be able to start new 
app attempts and try to launch app master on a different host.
* If there are invalid container resource change request (e.g., target resource 
is smaller than original resource for increase), the AMRMClient will throw 
exception (i.e. InvalidResourceRequestException) at the allocate call, and 
current implementation of the distributed shell appmaster will exit, causing 
the entire application to exit.

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
> Attachments: YARN-4175.1.patch
>
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942973#comment-14942973
 ] 

Varun Saxena commented on YARN-4178:


bq. To that effect, if you’d like, we can rename TimelineWriterUtils to 
TimelineStorageUtils so that both reader and writer can use functions from 
this. 
Also,let’s have the invert(long) and invert(int) functions in the same util 
class, instead of adding in a new util class.
I think we can have a single class TimelineStorageUtils and remove 
TimelineWriterUtils and TimelineReaderUtils.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942972#comment-14942972
 ] 

Varun Saxena commented on YARN-4178:


bq. Why do we need any util classes for this, can't an AppId class handle this 
by itself (convert from string to byte representation and back)?
Should we be adding code specific to ATS and that too HBase implementation of 
ATS into a class(ApplicationId) which is used all across YARN ?

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4009) CORS support for ResourceManager REST API

2015-10-05 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4009:

Attachment: YARN-4009.006.patch

Uploaded a new patch to address Jonathan's comments.

bq.Configuration usage set(config setting, 'true") is better as 
setBoolean(config setting, true)

Fixed.

bq.timeline server now uses its own different way to enable. So if i turn 
on resource manager and timeline server both on but nothing else, I get a CORS 
disabled message in the timeline server log even though it is enabled. Could 
you file a jira to address this spurious log message?

Fixed.

> CORS support for ResourceManager REST API
> -
>
> Key: YARN-4009
> URL: https://issues.apache.org/jira/browse/YARN-4009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Vasudev
> Attachments: YARN-4009.001.patch, YARN-4009.002.patch, 
> YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, 
> YARN-4009.006.patch
>
>
> Currently the REST API's do not have CORS support. This means any UI (running 
> in browser) cannot consume the REST API's. For ex Tez UI would like to use 
> the REST API for getting application, application attempt information exposed 
> by the API's. 
> It would be very useful if CORS is enabled for the REST API's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4216) Container logs not shown for newly assigned containers after NM recovery

2015-10-05 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943113#comment-14943113
 ] 

Bibin A Chundatt commented on YARN-4216:


{quote}
That's why YARN-1362 was done, so we can explicitly tell the nodemanager 
whether or not the NM is under supervision and likely to restart.
{quote}
*yarn.nodemanager.recovery.supervised=false* in my current setup.
In this case as i understand from above comment i am supposed to set 
*yarn.nodemanager.recovery.supervised* as true to inform restart is under 
supervision.
 
[~jlowe] so should i close this jira ??

> Container logs not shown for newly assigned containers  after NM  recovery
> --
>
> Key: YARN-4216
> URL: https://issues.apache.org/jira/browse/YARN-4216
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: NMLog, ScreenshotFolder.png, yarn-site.xml
>
>
> Steps to reproduce
> # Start 2 nodemanagers  with NM recovery enabled
> # Submit pi job with 20 maps 
> # Once 5 maps gets completed in NM 1 stop NM (yarn daemon stop nodemanager)
> (Logs of all completed container gets aggregated to HDFS)
> # Now start  the NM1 again and wait for job completion
> *The newly assigned container logs on NM1 are not shown*
> *hdfs log dir state*
> # When logs are aggregated to HDFS during stop its with NAME (localhost_38153)
> # On log aggregation after starting NM the newly assigned container logs gets 
> uploaded with name  (localhost_38153.tmp) 
> History server the logs are now shown for new task attempts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4216) Container logs not shown for newly assigned containers after NM recovery

2015-10-05 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943124#comment-14943124
 ] 

Bibin A Chundatt commented on YARN-4216:


[Document|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html]
 doesn't mention  about the  *yarn.nodemanager.recovery.supervised* . Should i 
update doc?

> Container logs not shown for newly assigned containers  after NM  recovery
> --
>
> Key: YARN-4216
> URL: https://issues.apache.org/jira/browse/YARN-4216
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: NMLog, ScreenshotFolder.png, yarn-site.xml
>
>
> Steps to reproduce
> # Start 2 nodemanagers  with NM recovery enabled
> # Submit pi job with 20 maps 
> # Once 5 maps gets completed in NM 1 stop NM (yarn daemon stop nodemanager)
> (Logs of all completed container gets aggregated to HDFS)
> # Now start  the NM1 again and wait for job completion
> *The newly assigned container logs on NM1 are not shown*
> *hdfs log dir state*
> # When logs are aggregated to HDFS during stop its with NAME (localhost_38153)
> # On log aggregation after starting NM the newly assigned container logs gets 
> uploaded with name  (localhost_38153.tmp) 
> History server the logs are now shown for new task attempts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944363#comment-14944363
 ] 

Hadoop QA commented on YARN-3216:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   9m  6s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 44s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 18s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 10s | The applied patch generated  
18 new checkstyle issues (total was 271, now 268). |
| {color:red}-1{color} | whitespace |   0m  8s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 44s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  62m 27s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 108m 45s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
 |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765017/0003-YARN-3216.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 30ac69c |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9355/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9355/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9355/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9355/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9355/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9355/console |


This message was automatically generated.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM periodically for distributed nodelabels

2015-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944381#comment-14944381
 ] 

Hudson commented on YARN-4176:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2427 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2427/])
YARN-4176. Resync NM nodelabels with RM periodically for distributed (wangda: 
rev 30ac69c6bd3db363248d6c742561371576006dab)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


> Resync NM nodelabels with RM periodically for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch, 0004-YARN-4176.patch, 0005-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-10-05 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944384#comment-14944384
 ] 

Neelesh Srinivas Salian commented on YARN-4185:
---

[~adhoot], thanks for the clarification.
So, the initial retries can be done with backoff times of 1,2,4,8 that is still 
less then 10 and thus give the opportunity to retry for a short-lived NM 
restart (under 10 seconds)
We can continue to wait 10 seconds of backoff incrementally to accomodate a 
larger failure time.

Thus, the failure times can be under 1,2,4,8,10,10 and so on till the number of 
retries is exhausted.
My only concern is that if the failure lasts longer than the total wait time 
and the number of retries, there won't be a chance to retry.

I'll write up a patch to exhibit this.
Thank you.

> Retry interval delay for NM client can be improved from the fixed static 
> retry 
> ---
>
> Key: YARN-4185
> URL: https://issues.apache.org/jira/browse/YARN-4185
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>
> Instead of having a fixed retry interval that starts off very high and stays 
> there, we are better off using an exponential backoff that has the same fixed 
> max limit. Today the retry interval is fixed at 10 sec that can be 
> unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944130#comment-14944130
 ] 

Varun Saxena commented on YARN-4178:


New patch updated.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4178:
---
Attachment: YARN-4178-YARN-2928.04.patch

[~sjlee0], rebased the patch

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4178:
---
Attachment: (was: YARN-4178-YARN-2928.04.patch)

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944052#comment-14944052
 ] 

Hadoop QA commented on YARN-4162:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   8m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 11s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m  8s | The applied patch generated  
28 new checkstyle issues (total was 222, now 249). |
| {color:green}+1{color} | whitespace |   0m  5s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 29s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  51m 36s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  97m  2s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 |
|   | org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart 
|
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765041/YARN-4162.v2.002.patch 
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3e1752f |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9351/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9351/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9351/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9351/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9351/console |


This message was automatically generated.

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

2015-10-05 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944067#comment-14944067
 ] 

Li Lu commented on YARN-4061:
-

Thanks for the review [~sjlee0]! 

bq. Since the actual storage writer (HBase) always acts on this queue 
asynchronously, it seems that the client cannot have a synchronous write 
semantics. Is that a correct reading? If so, how would we implement such a 
synchronous write?

This is definitely a valid concern. Yes having a pure synchronous semantic with 
this design is hard. To support synchronous semantic we generally have two ways:
- We not only need to enforce a flush, but on synchronous calls also need to 
block until the the data is actually persisted onto HBase. The advantage of 
this design is simplicity, but if the HBase storage is not available we cannot 
perform any synchronous calls. This makes the "fault tolerant" feature less 
appealing. 
- Since we know (and trust) that data on HDFS will be eventually available in 
HBase, maybe we can have a FT reader to check HDFS on or before we check the 
HBase? In this way we can always select out the most update data, either in 
HDFS or in HBase. The shortcoming of this approach is that local file storage 
will not work here, because those buffered data is not generally available to 
other nodes (and I doubt if this strong consistency model is too ambitious 
given the amount of data). 

About throughput, I agree we need to be careful here. We may have some traffic 
with similar scale and flow as the MapReduce JobHistory server? If this is the 
case, I think we can definitely start with some ideas in the JHS? 

> [Fault tolerance] Fault tolerant writer for timeline v2
> ---
>
> Key: YARN-4061
> URL: https://issues.apache.org/jira/browse/YARN-4061
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4178:
---
Attachment: YARN-4178-YARN-2928.04.patch

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944123#comment-14944123
 ] 

Hadoop QA commented on YARN-4178:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 12s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:red}-1{color} | javac |   4m 31s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765067/YARN-4178-YARN-2928.04.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 09c3576 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9353/console |


This message was automatically generated.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM periodically for distributed nodelabels

2015-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944272#comment-14944272
 ] 

Hudson commented on YARN-4176:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #492 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/492/])
YARN-4176. Resync NM nodelabels with RM periodically for distributed (wangda: 
rev 30ac69c6bd3db363248d6c742561371576006dab)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> Resync NM nodelabels with RM periodically for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch, 0004-YARN-4176.patch, 0005-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944289#comment-14944289
 ] 

Wangda Tan commented on YARN-3216:
--

Thanks [~sunilg].

Went through the patch, some comments:

1. AbstractCSQueue: Instead of adding AM-used-resource to parentQueue, I think 
we may only need to calculate AM-used-resource on LeafQueueu and user. 
Currently we don't have limitation of AM-used-resource on parentQueue, so the 
aggregated resource may not be very useful. We can add it along the hierachy if 
we want to limit max-am-percent on parentQueue in the future.

2. CapacitySchedulerConfiguration: Instead of introduce a new configuration: 
MAXIMUM_AM_RESOURCE_PARTITION_SUFFIX, I suggest to use the existing one: 
maximum-am-resource-percent. If 
{{queue.accessible-node-labels..maximum-am-resource-percent}} not set, 
it uses queue.maximum-am-resource-percent. Please let me know if there's any 
specific reason to add a new maximum-am-resource-partition.

3. LeafQueue: I'm wondering if we need to maintain map of {{PartitionInfo}}: 
PartitionInfo.getActiveApplications is only used to check if there's any 
activated apps under a partition, it is equivalent to 
{{queueUsage.getAMUsed(partitionName) > 0}}.

4. SchedulerApplicationAttempt: I think return value getAMUsed should be:
- Before AM container allocated, it returns AM-Resource-Request.resource on 
partition=AM-Resource-Request.node-label-request
- After AM container allocated, it returns AM-Container.resource on 
partition=AM-Node.partition
- you don't have to update am-resource when AM container just allocated, 
because AM-container.resource and am-resource-request.node-label-request won't 
be changed, but you need to update this if partition of AM-container's NM 
updated). I'm not sure if it is clear to you, please let me know if you need 
more elaborate about this comment.

I found you removed some code from FiCaSchedulerApp's constructor, I think 
getAMUsed should still return correct value before AM container allocated, 
otherwise the computation might be wrong. Let me know if I didn't understand 
your code correctly.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944292#comment-14944292
 ] 

Wangda Tan commented on YARN-3216:
--

And forgot to mention:
5. About am-resource-percent per user per partition. Currently you have only 
considered am-resource-percent per queue, I think you need to calculate (not 
configure) per-user-per-partition am-resource-limit as well. Since the patch is 
already very complex to me, I'm fine with doing the math of 
am-resource-limit-per-user in a separated JIRA.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-10-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944313#comment-14944313
 ] 

Wangda Tan commented on YARN-4215:
--

Thanks [~Naganarasimha]! Patch generally looks good to me, 
Only a few nits in the test:
- I suggest to add one more test just to avoid regression in the future: If a 
host has label (node1:0 label=x). If someone update label of node1:0 to y. 
Scheduler should receive an event as well. 
- Could you check if labels of the events are expected?
{code}
483 mgr.replaceLabelsOnNode(ImmutableMap.of(toNodeId("n1:1"), 
toSet("p1"),
484 toNodeId("n2:1"), toSet("p2"), toNodeId("n3"), toSet("p3")));
485 assertTrue("Event should be sent when there is change in labels",
486 schedEventsHandler.receivedEvent);
487 assertEquals("3 node label mapping modified", 3,
488 schedEventsHandler.updatedNodeToLabels.size());
489 schedEventsHandler.receivedEvent = false;
{code}

> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Attachments: YARN-4215.v1.001.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM periodically for distributed nodelabels

2015-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944336#comment-14944336
 ] 

Hudson commented on YARN-4176:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #1222 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1222/])
YARN-4176. Resync NM nodelabels with RM periodically for distributed (wangda: 
rev 30ac69c6bd3db363248d6c742561371576006dab)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* hadoop-yarn-project/CHANGES.txt


> Resync NM nodelabels with RM periodically for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch, 0004-YARN-4176.patch, 0005-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4176) Resync NM nodelabels with RM periodically for distributed nodelabels

2015-10-05 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4176:
-
Summary: Resync NM nodelabels with RM periodically for distributed 
nodelabels  (was: Resync NM nodelabels with RM every x interval for distributed 
nodelabels)

> Resync NM nodelabels with RM periodically for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch, 0004-YARN-4176.patch, 0005-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM every x interval for distributed nodelabels

2015-10-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944179#comment-14944179
 ] 

Wangda Tan commented on YARN-4176:
--

Latest patch LGTM, committing..

> Resync NM nodelabels with RM every x interval for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch, 0004-YARN-4176.patch, 0005-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944190#comment-14944190
 ] 

Hadoop QA commented on YARN-4178:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m  2s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 22s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 25s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 17s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  4s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 56s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m 49s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  41m 24s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765073/YARN-4178-YARN-2928.04.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 09c3576 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9354/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9354/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9354/console |


This message was automatically generated.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch, YARN-4178-YARN-2928.03.patch, 
> YARN-4178-YARN-2928.04.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM periodically for distributed nodelabels

2015-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944195#comment-14944195
 ] 

Hudson commented on YARN-4176:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8573 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8573/])
YARN-4176. Resync NM nodelabels with RM periodically for distributed (wangda: 
rev 30ac69c6bd3db363248d6c742561371576006dab)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Resync NM nodelabels with RM periodically for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch, 0004-YARN-4176.patch, 0005-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944144#comment-14944144
 ] 

Wangda Tan commented on YARN-4162:
--

Thanks [~Naganarasimha] working on this! Sample REST response generally looks 
good, some comments: 

1) Suggest to rename ResourceUsageInfo.partitionResourceUsages to 
resourceUsagesByPartition. Similarly, 
QueueCapacitiesInfo.partitionQueueCapacity -> queueCapacitiesByPartition
2) I think pendingResource/amResource is also very important for 
user/leafQueue's resourceUsage.
3) I think it's better to move 
CapacitySchedulerInfo#getResourceUsageInfo/getQueueCapacitiesInfo to 
ResourceUsage#createResourceUsageInfo and 
QueueCapacities#createQueueCapacitiesInfo. With this, you can access internal 
fields of ResourceUsage/QueueCapacities and it will be more nature to me to 
create -Info from a class.

Will include more detailed code review at next iteration.

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-10-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944169#comment-14944169
 ] 

Wangda Tan commented on YARN-3964:
--

Thanks [~dian.fu] working on this patch and reviews from 
[~Naganarasimha]/[~sunilg]. I think the approach/patch generally looks good and 
safe to me. [~devaraj.k], could you take care of following review works if you 
have bandwidth?

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.014.patch, YARN-3964.015.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4061) [Fault tolerance] Fault tolerant writer for timeline v2

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944200#comment-14944200
 ] 

Sangjin Lee commented on YARN-4061:
---

I don't think the MR JHS is an apt comparison. First, we're dealing with a 
totally distributed writer situation (individual jobs) for the MR JHS whereas 
the RM timeline collector would be a single significant writer (again, it's the 
RM collector that I'm most worried about). Also, JHS writes only a few large 
files (job conf, job history files, etc.), whereas the timeline service will 
write a huge number of tiny writes. The volume of writes will be much larger 
than the JHS use case.

Regarding the synchronous semantics, we really need to think it through. On the 
one hand, we might consider handling the synchronous calls separate from the 
rest and outside the log queue, but it's not clear how one can make it work 
alongside the asynchronous writes that are going on.

> [Fault tolerance] Fault tolerant writer for timeline v2
> ---
>
> Key: YARN-4061
> URL: https://issues.apache.org/jira/browse/YARN-4061
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: FaulttolerantwriterforTimelinev2.pdf
>
>
> We need to build a timeline writer that can be resistant to backend storage 
> down time and timeline collector failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM periodically for distributed nodelabels

2015-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944236#comment-14944236
 ] 

Hudson commented on YARN-4176:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #483 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/483/])
YARN-4176. Resync NM nodelabels with RM periodically for distributed (wangda: 
rev 30ac69c6bd3db363248d6c742561371576006dab)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> Resync NM nodelabels with RM periodically for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch, 0004-YARN-4176.patch, 0005-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944416#comment-14944416
 ] 

Sunil G commented on YARN-3216:
---

Thank you [~leftnoteasy] for sharing the comments.

bq.Please let me know if there's any specific reason to add a new 
maximum-am-resource-partition.
I agree with you. We could use the same configuration name under each label.

bq.if there's any activated apps under a partition, it is equivalent to 
queueUsage.getAMUsed(partitionName)
Yes. This will be enough, i kept a new map with the idea of maintaining some 
more information in similar lines with User. But as of now, the change 
suggested is enough. I will remove the map.

bq.you don't have to update am-resource when AM container just allocated, 
because AM-container.resource and am-resource-request.node-label-request won't 
be changed, but you need to update this if partition of AM-container's NM 
updated
As I see it, we may need to below change.
- In FiCaSchedulerApp's ctor, update AM-Resource-Request.resource on partition 
( keep existing code). But use 
{{rmApp.getAMResourceRequest().getNodeLabelExpression()}} to setAMResource 
instead of setting to NO_LABEL. Because this information wont be changed later.
- if partition of AM-container's NM updated, we need to change AMResource which 
I am handling in {{nodePartitionUpdated}} as below.
{code}
+if (rmContainer.isAMContainer()) {
+  setAppAMNodePartitionName(newPartition);
+  this.attemptResourceUsage.decAMUsed(oldPartition, containerResource);
+  this.attemptResourceUsage.incAMUsed(newPartition, containerResource);
+  getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, 
this);
+  getCSLeafQueue().incAMUsedResource(newPartition, containerResource, 
this);
+}
{code}
Here AM-Resource-Request.resource is updated in FiCaSchedulerApp's ctor based 
on {{rmApp.getAMResourceRequest}}. Once container is allocated, this resource 
will be come a part of the partition with no change in resource. So I feel I 
need not have to update resource in *allocate* call of FicaSchedulerApp. Am I 
correct?
- am-resource-percent per user per partition: Yes, I will raise a new ticket to 
handle this and will make changes there instead of doing in this.


> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4176) Resync NM nodelabels with RM periodically for distributed nodelabels

2015-10-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1490#comment-1490
 ] 

Hudson commented on YARN-4176:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2397 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2397/])
YARN-4176. Resync NM nodelabels with RM periodically for distributed (wangda: 
rev 30ac69c6bd3db363248d6c742561371576006dab)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java


> Resync NM nodelabels with RM periodically for distributed nodelabels
> 
>
> Key: YARN-4176
> URL: https://issues.apache.org/jira/browse/YARN-4176
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4176.patch, 0002-YARN-4176.patch, 
> 0003-YARN-4176.patch, 0004-YARN-4176.patch, 0005-YARN-4176.patch
>
>
> This JIRA is for handling the below set of issue
> # Distributed nodelabels after NM registered with RM if cluster nodelabels 
> are removed and added then NM doesnt resend labels in heartbeat again untils 
> any change in labels
> # NM registration failed with Nodelabels should resend labels again to RM 
> The above cases can be handled by  resync nodeLabels with RM every x interval
> # Add property {{yarn.nodemanager.node-labels.provider.resync-interval-ms}} 
> and  will resend nodelabels to RM based on config no matter what the 
> registration fails or success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-05 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4162:

Attachment: YARN-4162.v2.003.patch

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, YARN-4162.v2.003.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-10-05 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4162:

Attachment: (was: YARN-4162.v2.003.patch)

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch, YARN-4162.v2.001.patch, 
> YARN-4162.v2.002.patch, restAndJsonOutput.zip
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the REST interface to conform to current REST APIs' in YARN

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943298#comment-14943298
 ] 

Varun Saxena commented on YARN-4224:


[~sjlee0] raised this point on YARN-3864.
Current REST API format does not conform to REST APIs' elsewhere in hadoop.
As this is a user facing API, everyone can share their thoughts on this.

> Change the REST interface to conform to current REST APIs' in YARN
> --
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API

2015-10-05 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943315#comment-14943315
 ] 

Varun Vasudev commented on YARN-4009:
-

The release audit and findbugs issues are unrelated to the patch. The tests 
pass on my local machine.

> CORS support for ResourceManager REST API
> -
>
> Key: YARN-4009
> URL: https://issues.apache.org/jira/browse/YARN-4009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>Assignee: Varun Vasudev
> Attachments: YARN-4009.001.patch, YARN-4009.002.patch, 
> YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch, 
> YARN-4009.006.patch
>
>
> Currently the REST API's do not have CORS support. This means any UI (running 
> in browser) cannot consume the REST API's. For ex Tez UI would like to use 
> the REST API for getting application, application attempt information exposed 
> by the API's. 
> It would be very useful if CORS is enabled for the REST API's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943157#comment-14943157
 ] 

Hadoop QA commented on YARN-4009:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  27m 19s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m  4s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 30s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 19s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | site |   3m 35s | Site still builds. |
| {color:red}-1{color} | checkstyle |   3m 29s | The applied patch generated  6 
new checkstyle issues (total was 0, now 6). |
| {color:red}-1{color} | checkstyle |   4m  4s | The applied patch generated  2 
new checkstyle issues (total was 211, now 212). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 55s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |  11m 13s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   8m  1s | Tests passed in 
hadoop-common. |
| {color:green}+1{color} | yarn tests |   0m 27s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m  8s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   4m  5s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 29s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   9m  2s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  60m 23s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 154m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
| Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestRM |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764977/YARN-4009.006.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 30e2f83 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/diffcheckstylehadoop-common.txt
 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9345/console |


This message was automatically generated.

> CORS 

[jira] [Created] (YARN-4224) Change the REST interface to conform to current REST APIs' in YARN

2015-10-05 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4224:
--

 Summary: Change the REST interface to conform to current REST 
APIs' in YARN
 Key: YARN-4224
 URL: https://issues.apache.org/jira/browse/YARN-4224
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the REST interface to conform to current REST APIs' in YARN

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943320#comment-14943320
 ] 

Varun Saxena commented on YARN-4224:


My proposal would be as under
* *Query flows*
  Current REST API for querying flows is 
{{/ws/v2/timeline/flows/\{clusterid}/}} . This can be changed to :
{panel}
*/ws/v2/timeline/\{clusterid}/flows*
_Eg :_ /ws/v2/timeline/yarn_cluster/flows
{panel}

* *Query flowrun*
  Current REST API is 
{{/ws/v2/timeline/flowrun/\{clusterid}/\{flowid}/\{flowrunid}/}} . This can be 
changed to :
{panel}
*/ws/v2/timeline/\{clusterid}/\{flowid}/run/\{flowrunid}*
_Eg :_ /ws/v2/timeline/yarn_cluster/hive_flow/run/123
{panel}

* *Query app*
  Current REST API in YARN-3864 is 
{{/ws/v2/timeline/app/\{clusterid}/\{appid}/}} . This can be changed to :
{panel}
*/ws/v2/timeline/\{clusterid}/app/\{appid}*
_Eg :_ /ws/v2/timeline/yarn_cluster/app/application_11_1345
{panel}

* *Query apps for a flow*
  Current REST API in YARN-3864 is 
{{/ws/v2/timeline/flowapps/\{clusterid}/\{flowid}/}} . This can be changed to :
{panel}
*/ws/v2/timeline/\{clusterid}/\{flowid}/apps*
_Eg :_ /ws/v2/timeline/yarn_cluster/hive_flow/apps
{panel}

* *Query apps for a flowrun*
  Current REST API in YARN-3864 is 
{{/ws/v2/timeline/flowrunapps/\{clusterid}/\{flowid}/\{flowrunid}/}} . This can 
be changed to :
{panel}
*/ws/v2/timeline/\{clusterid}/\{flowid}/\{flowrunid}/apps*
_Eg :_ /ws/v2/timeline/yarn_cluster/hive_flow/123/apps
{panel}

* *Query entity*
  Current REST API is 
{{/ws/v2/timeline/entity/\{clusterid}/\{appid}/\{entitytype}/\{entityid}/}} . 
This can be changed to :
{panel}
*/ws/v2/timeline/\{clusterid}/\{appid}/\{entitytype}/entity/\{entityid}*
_Eg :_ 
/ws/v2/timeline/yarn_cluster/application_1444034548255_0001/YARN_CONTAINER/entity/container_1444034548255_0001_01_01
{panel}

* *Query entities*
  Current REST API is 
{{/ws/v2/timeline/entities/\{clusterid}/\{appid}/\{entitytype}/}} . This can be 
changed to :
{panel}
*/ws/v2/timeline/\{clusterid}/\{appid}/\{entitytype}/entities*
_Eg :_ 
/ws/v2/timeline/yarn_cluster/application_1444034548255_0001/YARN_CONTAINER/entities
{panel}

> Change the REST interface to conform to current REST APIs' in YARN
> --
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4223) Findbugs warnings in hadoop-yarn-server-nodemanager project

2015-10-05 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943328#comment-14943328
 ] 

Varun Saxena commented on YARN-4223:


Release audit warning is unrelated. There is HDFS-9182 already for it.

> Findbugs warnings in hadoop-yarn-server-nodemanager project
> ---
>
> Key: YARN-4223
> URL: https://issues.apache.org/jira/browse/YARN-4223
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: FindBugs Report.html, YARN-4223.01.patch
>
>
> {noformat}
>  classname='org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher'>
>message='Unchecked/unconfirmed cast from 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent
>  to 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.SignalContainersLauncherEvent
>  in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncherEvent)'
>  lineNumber='146'/>
>
>   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-05 Thread MENG DING (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MENG DING updated YARN-1509:

Attachment: YARN-1509.4.patch

Submit the new patch that fixes the whitespace issue

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4216) Container logs not shown for newly assigned containers after NM recovery

2015-10-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943368#comment-14943368
 ] 

Jason Lowe commented on YARN-4216:
--

Yes, the document should be updated to cover that property.  Did you try 
setting that property to true, and does it solve your issue?

> Container logs not shown for newly assigned containers  after NM  recovery
> --
>
> Key: YARN-4216
> URL: https://issues.apache.org/jira/browse/YARN-4216
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: NMLog, ScreenshotFolder.png, yarn-site.xml
>
>
> Steps to reproduce
> # Start 2 nodemanagers  with NM recovery enabled
> # Submit pi job with 20 maps 
> # Once 5 maps gets completed in NM 1 stop NM (yarn daemon stop nodemanager)
> (Logs of all completed container gets aggregated to HDFS)
> # Now start  the NM1 again and wait for job completion
> *The newly assigned container logs on NM1 are not shown*
> *hdfs log dir state*
> # When logs are aggregated to HDFS during stop its with NAME (localhost_38153)
> # On log aggregation after starting NM the newly assigned container logs gets 
> uploaded with name  (localhost_38153.tmp) 
> History server the logs are now shown for new task attempts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1510) Make NMClient support change container resources

2015-10-05 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943406#comment-14943406
 ] 

MENG DING commented on YARN-1510:
-

* The release audit is not related.
* The failed test passed in my own environment after applying the patch, so it 
is not related.

> Make NMClient support change container resources
> 
>
> Key: YARN-1510
> URL: https://issues.apache.org/jira/browse/YARN-1510
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1510-YARN-1197.1.patch, 
> YARN-1510-YARN-1197.2.patch, YARN-1510.3.patch, YARN-1510.4.patch
>
>
> As described in YARN-1197, YARN-1449, we need add API in NMClient to support
> 1) sending request of increase/decrease container resource limits
> 2) get succeeded/failed changed containers response from NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943425#comment-14943425
 ] 

Sunil G commented on YARN-3216:
---

Hi [~eepayne]
Thank you for sharing the comments. As Naga mentioned, we could retrieve 
am-resource-precent  per-partition (per queue) config information can also be 
fetched from REST. Also as you mentioned, this information can also be 
retrieved from GUI (such as "am resource usage per queue per partition) from 
partition-tab in scheduler page. 

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4226) Make capacity scheduler queue's preemption status REST API consistent with GUI

2015-10-05 Thread Eric Payne (JIRA)
Eric Payne created YARN-4226:


 Summary: Make capacity scheduler queue's preemption status REST 
API consistent with GUI
 Key: YARN-4226
 URL: https://issues.apache.org/jira/browse/YARN-4226
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, yarn
Affects Versions: 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Minor


In the capacity scheduler GUI, the preemption status has the following form:
{code}
Preemption: disabled
{code}
However, the REST API shows the following for the same status:
{code}
preemptionDisabled":true
{code}
The latter is confusing and should be consistent with the format in the GUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-10-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4225:
-
Component/s: capacity scheduler

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2902) Killing a container that is localizing can orphan resources in the DOWNLOADING state

2015-10-05 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2902:
---
Target Version/s: 2.7.2  (was: 2.8.0)

> Killing a container that is localizing can orphan resources in the 
> DOWNLOADING state
> 
>
> Key: YARN-2902
> URL: https://issues.apache.org/jira/browse/YARN-2902
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-2902.002.patch, YARN-2902.03.patch, 
> YARN-2902.04.patch, YARN-2902.05.patch, YARN-2902.06.patch, 
> YARN-2902.07.patch, YARN-2902.patch
>
>
> If a container is in the process of localizing when it is stopped/killed then 
> resources are left in the DOWNLOADING state.  If no other container comes 
> along and requests these resources they linger around with no reference 
> counts but aren't cleaned up during normal cache cleanup scans since it will 
> never delete resources in the DOWNLOADING state even if their reference count 
> is zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-10-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-3769:
-
Attachment: (was: YARN-3769-branch-2.002.patch)

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-3769-branch-2.7.002.patch, 
> YARN-3769.001.branch-2.7.patch, YARN-3769.001.branch-2.8.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4225) Add preemption status to yarn queue -status

2015-10-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4225:
-
Summary: Add preemption status to yarn queue -status  (was: Add preemption 
status to {{yarn queue -status}})

> Add preemption status to yarn queue -status
> ---
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943567#comment-14943567
 ] 

Hadoop QA commented on YARN-1509:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 17s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 18s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 30s | The applied patch generated  5 
new checkstyle issues (total was 79, now 78). |
| {color:green}+1{color} | whitespace |   0m  8s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 53s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   7m 31s | Tests passed in 
hadoop-yarn-client. |
| | |  46m  1s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765011/YARN-1509.4.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b925cf1 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9346/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9346/artifact/patchprocess/diffcheckstylehadoop-yarn-client.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9346/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9346/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9346/console |


This message was automatically generated.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4216) Container logs not shown for newly assigned containers after NM recovery

2015-10-05 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943574#comment-14943574
 ] 

Bibin A Chundatt commented on YARN-4216:


When yarn.nodemanager.recovery.supervised=true and  nodemanager stoppped abort 
aggregation is called

2015-10-05 20:17:20,634 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 *Aborting log aggregation for application_1444056058955_0002*
{noformat}
2015-10-05 20:17:20,634 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2015-10-05 20:17:20,634 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
 waiting for pending aggregation during exit
2015-10-05 20:17:20,634 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 Aborting log aggregation for application_1444056058955_0002
2015-10-05 20:17:20,634 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
 Aggregation did not complete for application application_1444056058955_0002
2015-10-05 20:17:20,639 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 is interrupted. Exiting.
2015-10-05 20:17:20,664 INFO org.apache.hadoop.ipc.Server: Stopping server on 
8040
2015-10-05 20:17:20,665 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
listener on 8040
2015-10-05 20:17:20,665 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server 
Responder
2015-10-05 20:17:20,665 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Public cache exiting
2015-10-05 20:17:20,665 WARN 
org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl: 
org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl is 
interrupted. Exiting.
2015-10-05 20:17:20,671 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Stopping NodeManager metrics system...
2015-10-05 20:17:20,674 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
NodeManager metrics system stopped.
{noformat}

Container logs are not cleaned up and uploaded to HDFS on stop

But decommision + nm restart while application is running should cause  the 
same  log missing scenario as per {{LogAggregationService#stopAggregators}}

{code}
 boolean supervised = getConfig().getBoolean(
YarnConfiguration.NM_RECOVERY_SUPERVISED,
YarnConfiguration.DEFAULT_NM_RECOVERY_SUPERVISED);
// if recovery on restart is supported then leave outstanding aggregations
// to the next restart
boolean shouldAbort = context.getNMStateStore().canRecover()
&& !context.getDecommissioned() && supervised;
// politely ask to finish
for (AppLogAggregator aggregator : appLogAggregators.values()) {
  if (shouldAbort) {
aggregator.abortLogAggregation();
  } else {
aggregator.finishLogAggregation();
  }
}
{code}
'

> Container logs not shown for newly assigned containers  after NM  recovery
> --
>
> Key: YARN-4216
> URL: https://issues.apache.org/jira/browse/YARN-4216
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: NMLog, ScreenshotFolder.png, yarn-site.xml
>
>
> Steps to reproduce
> # Start 2 nodemanagers  with NM recovery enabled
> # Submit pi job with 20 maps 
> # Once 5 maps gets completed in NM 1 stop NM (yarn daemon stop nodemanager)
> (Logs of all completed container gets aggregated to HDFS)
> # Now start  the NM1 again and wait for job completion
> *The newly assigned container logs on NM1 are not shown*
> *hdfs log dir state*
> # When logs are aggregated to HDFS during stop its with NAME (localhost_38153)
> # On log aggregation after starting NM the newly assigned container logs gets 
> uploaded with name  (localhost_38153.tmp) 
> History server the logs are now shown for new task attempts



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943645#comment-14943645
 ] 

Sangjin Lee commented on YARN-3864:
---

I kicked off another jenkins build.

I have reviewed the latest patch (v.3), and it looks good to me for the most 
part. I have only a few minor comments.

(TimelineReaderWebServices.java)
- l.540: nit: let's use a normal Java style: {{req.getQueryString() == null}}
- l.575: If we're calling this end point "flowrunapps", then shouldn't the 
method be called {{getFlowRunApps}}? The latter one seems to be named that.
- Both for /flowrunapps and /flowapps, I understand it will return the most 
recent N apps if item is specified, correct? Then it should be stated in the 
javadoc.

If you could address those, and with jenkins passing, I'd like to go ahead and 
commit the patch. Do let me know if you have other comments. Thanks!

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: YARN-3864-YARN-2928.01.patch, 
> YARN-3864-YARN-2928.02.patch, YARN-3864-YARN-2928.03.patch, 
> YARN-3864-addendum-appaggregation.patch
>
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4221) Store user in app to flow table

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943668#comment-14943668
 ] 

Sangjin Lee commented on YARN-4221:
---

Thanks for turning around and creating this patch quickly, [~varun_saxena]!

I am in agreement with the approach taken in this patch. I'm going to take a 
closer look once we commit YARN-3864.

One high level comment: I think we should document the behavior of the REST API 
with as much detail as possible. It should be very clear about what params are 
required and what params are optional, what type of contents would be returned, 
and in what order the entities will be, etc. The javadoc here is as important 
as the code itself. So for example, we should have plenty of documentation on 
where the user id is required and where it is optional.

> Store user in app to flow table
> ---
>
> Key: YARN-4221
> URL: https://issues.apache.org/jira/browse/YARN-4221
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4221-YARN-2928.01.patch
>
>
> We should store user as well in in app to flow table.
> For queries where user is not supplied and flow context can be retrieved from 
> app to flow table, we should take the user from app to flow table instead of 
> considering UGI as default user.
> This is as per discussion on YARN-3864



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-10-05 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943587#comment-14943587
 ] 

MENG DING commented on YARN-1509:
-

* release audit is not related
* will apply for exception for checkstyle:
** relaxed visibility is for testing purposes.
** function length exceeding limit is caused by long comments.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch, YARN-1509.3.patch, 
> YARN-1509.4.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943613#comment-14943613
 ] 

Hadoop QA commented on YARN-3864:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764952/YARN-3864-addendum-appaggregation.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b925cf1 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9348/console |


This message was automatically generated.

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: YARN-3864-YARN-2928.01.patch, 
> YARN-3864-YARN-2928.02.patch, YARN-3864-YARN-2928.03.patch, 
> YARN-3864-addendum-appaggregation.patch
>
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4225) Add preemption status to {{yarn queue -status}}

2015-10-05 Thread Eric Payne (JIRA)
Eric Payne created YARN-4225:


 Summary: Add preemption status to {{yarn queue -status}}
 Key: YARN-4225
 URL: https://issues.apache.org/jira/browse/YARN-4225
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-10-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-3769:
-
Attachment: YARN-3769-branch-2.002.patch

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-3769-branch-2.002.patch, 
> YARN-3769-branch-2.7.002.patch, YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-10-05 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3216:
--
Attachment: 0003-YARN-3216.patch

Hi [~leftnoteasy]
Attaching v2 version of patch addressing the major comments. Kindly help to 
check the same.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch, 
> 0003-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-10-05 Thread Anubhav Dhoot
I don't think option 2 where you restart from 1 makes sense. Its also not a
goal to minimize the total wait time. He goal should minimize the time to
recover for short intermittent failure while also waiting  long enough for
long failures before giving up.
On Oct 3, 2015 6:43 PM, "Neelesh Srinivas Salian (JIRA)" 
wrote:

>
> [
> https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942528#comment-14942528
> ]
>
> Neelesh Srinivas Salian commented on YARN-4185:
> ---
>
> Thoughts:
> 1) Using the exponentialBackoffRetry policy will have a progression of
> wait time starting at 1sec per retry assuming it takes a second for the NM
> to come up.
> Hence exponentially, the backoff time increases 2,4,8,16...till 512 as we
> approach 10 retries.
>
> 2) In the current strategy, the wait time is 10 seconds which causes an NM
> that restarted in 1 second to wait for a retry.
>
> 3) In the event of the retries going forward, at the 3rd retry ( the wait
> time is collectively 7 seconds (1+2+4) as per the exponential strategy) and
> (30 (10+10+10) seconds as the current static retry)
>
> 4) If you keep retrying, collectively the waiting static retry has now
> waited for 60 seconds versus 2^6 = 64 seconds in the exponential strategy
> at the 6th retry attempt.
>
> Logic for the Design:
> 1) In the event of retries being default to 10,
>a. I propose after the 3rd attempt, we continue to keep the wait time
> as 4 seconds and continue the same.
>Thus the total time comes up to 1,2,4,4,4,4,4,4,4,4 = 35 seconds.
>b. Versus collectively spending 100 seconds on waiting time in the
> static retry strategy.
>
> 2) Alternatively, the logic could be:
>a. Have the 1st 3 attempts of retry. If further needed, fall back to
> the 1sec start of the same logic.
>   So, it looks like this.. (1,2,4)  (1,2,4)  (1,2,4) (1) for 10
> retries.
>b. Thus we get the 10 retries done in collectively 22 seconds versus
> 100 seconds.
>
> Requesting feedback.
> Thank you.
>
> > Retry interval delay for NM client can be improved from the fixed static
> retry
> >
> ---
> >
> > Key: YARN-4185
> > URL: https://issues.apache.org/jira/browse/YARN-4185
> > Project: Hadoop YARN
> >  Issue Type: Bug
> >Reporter: Anubhav Dhoot
> >Assignee: Neelesh Srinivas Salian
> >
> > Instead of having a fixed retry interval that starts off very high and
> stays there, we are better off using an exponential backoff that has the
> same fixed max limit. Today the retry interval is fixed at 10 sec that can
> be unnecessarily high especially when NMs could rolling restart within a
> sec.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


[jira] [Updated] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler

2015-10-05 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4225:
-
Summary: Add preemption status to yarn queue -status for capacity scheduler 
 (was: Add preemption status to yarn queue -status)

> Add preemption status to yarn queue -status for capacity scheduler
> --
>
> Key: YARN-4225
> URL: https://issues.apache.org/jira/browse/YARN-4225
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-10-05 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943648#comment-14943648
 ] 

Sangjin Lee commented on YARN-3864:
---

Should have refreshed the page first. :) The jenkins is failing because it's 
testing the addendum patch.

Built the patch locally, ran all the tests and findbugs. All seem fine.

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: YARN-3864-YARN-2928.01.patch, 
> YARN-3864-YARN-2928.02.patch, YARN-3864-YARN-2928.03.patch, 
> YARN-3864-addendum-appaggregation.patch
>
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943680#comment-14943680
 ] 

Hadoop QA commented on YARN-3769:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 30s | Pre-patch branch-2 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   5m 56s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 58s | The applied patch generated  6 
new checkstyle issues (total was 145, now 150). |
| {color:red}-1{color} | whitespace |   0m  6s | The patch has 26  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 15s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  56m  6s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 23s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | 
hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyForNodePartitions
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12765015/YARN-3769-branch-2.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | branch-2 / d843c50 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9347/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/9347/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9347/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9347/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9347/console |


This message was automatically generated.

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-3769-branch-2.002.patch, 
> YARN-3769-branch-2.7.002.patch, YARN-3769.001.branch-2.7.patch, 
> YARN-3769.001.branch-2.8.patch
>
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >