[jira] [Updated] (YARN-2330) Jobs are not displaying in timeline server after RM restart

2014-07-22 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2330:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-321

 Jobs are not displaying in timeline server after RM restart
 ---

 Key: YARN-2330
 URL: https://issues.apache.org/jira/browse/YARN-2330
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
 Environment: Nodemanagers 3 (3*8GB)
 Queues A = 70%
 Queues B = 30%
Reporter: Nishan Shetty

 Submit jobs to queue a
 While job is running Restart RM 
 Observe that those jobs are not displayed in timelineserver
 {code}
 2014-07-22 10:11:32,084 ERROR 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  History information of application application_1406002968974_0003 is not 
 included into the result due to the exception
 java.io.IOException: Cannot seek to negative offset
   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1381)
   at 
 org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63)
   at org.apache.hadoop.io.file.tfile.BCFile$Reader.init(BCFile.java:624)
   at org.apache.hadoop.io.file.tfile.TFile$Reader.init(TFile.java:804)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileReader.init(FileSystemApplicationHistoryStore.java:683)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getHistoryFileReader(FileSystemApplicationHistoryStore.java:661)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getApplication(FileSystemApplicationHistoryStore.java:146)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getAllApplications(FileSystemApplicationHistoryStore.java:199)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAllApplications(ApplicationHistoryManagerImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:75)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:156)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1192)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 

[jira] [Commented] (YARN-2330) Jobs are not displaying in timeline server after RM restart

2014-07-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069943#comment-14069943
 ] 

Zhijie Shen commented on YARN-2330:
---

It is very likely that the history file of the given application is still not 
closed for writing (for example, after RM restarting, RM reopen the history 
file to append the history information). On the other side, the reader want to 
scan the file under writing.

The following logic is broken, because writer is invoked on RM, while reader is 
invoked on timeline server. Hence, from the point of view of reader. 
outstandingWriters is always empty. This cannot be used to indicate whether a 
file was opened for writing or not,
{code}
// The history file is still under writing
if (outstandingWriters.containsKey(appId)) {
  throw new IOException(History file for application  + appId
  +  is under writing);
}
{code}

 Jobs are not displaying in timeline server after RM restart
 ---

 Key: YARN-2330
 URL: https://issues.apache.org/jira/browse/YARN-2330
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
 Environment: Nodemanagers 3 (3*8GB)
 Queues A = 70%
 Queues B = 30%
Reporter: Nishan Shetty

 Submit jobs to queue a
 While job is running Restart RM 
 Observe that those jobs are not displayed in timelineserver
 {code}
 2014-07-22 10:11:32,084 ERROR 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  History information of application application_1406002968974_0003 is not 
 included into the result due to the exception
 java.io.IOException: Cannot seek to negative offset
   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1381)
   at 
 org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63)
   at org.apache.hadoop.io.file.tfile.BCFile$Reader.init(BCFile.java:624)
   at org.apache.hadoop.io.file.tfile.TFile$Reader.init(TFile.java:804)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileReader.init(FileSystemApplicationHistoryStore.java:683)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getHistoryFileReader(FileSystemApplicationHistoryStore.java:661)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getApplication(FileSystemApplicationHistoryStore.java:146)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getAllApplications(FileSystemApplicationHistoryStore.java:199)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAllApplications(ApplicationHistoryManagerImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:75)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:156)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   

[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart

2014-07-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069950#comment-14069950
 ] 

Zhijie Shen commented on YARN-2262:
---

How's your setup for RM restarting? does the application continue after RM 
restarting? If so, will the timeline server converge to show the missing fields 
correctly?

The wrong fields you saw here is because the finish information of the 
application is missing. However,  let's say if the application is going on 
until being finished after RM restarting, RM will still write the finish 
information, and the wrong values should be corrected finally.

One meta comment for this and YARN-2330: It would be great if you can help 
improving the generic history service. On the other side, we're seeking 
migrating the generic history data to the timeline store, as we've seen the 
limitations of the fs history store. If this is finalized, we may not continue 
the support of this store. You can keep an eye on YARN-2033.

 Few fields displaying wrong values in Timeline server after RM restart
 --

 Key: YARN-2262
 URL: https://issues.apache.org/jira/browse/YARN-2262
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Assignee: Naganarasimha G R

 Few fields displaying wrong values in Timeline server after RM restart
 State:null
 FinalStatus:  UNDEFINED
 Started:  8-Jul-2014 14:58:08
 Elapsed:  2562047397789hrs, 44mins, 47sec 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-07-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069994#comment-14069994
 ] 

Zhijie Shen commented on YARN-2229:
---

Some additional comments:

0. For those that have been marked \@Private, it should be okay we break the 
backward compatibility. The problem that I can think of is RM and NM versions 
are out of sync. For example, RM is 2.6 and it takes the id as a long, while NM 
is 2.4 and it takes the id as an int. 

1. I'm not sure it's good to makr a \@Stable method back to \@Unstable
{code}
   @Public
-  @Stable
+  @Unstable
   public abstract int getId();
{code}

2. So anyway, we're going to break the users that use protobuf to make the 
client in their own programing language, aren't we?
{code}
   optional ApplicationAttemptIdProto app_attempt_id = 2;
-  optional int32 id = 3;
+  optional int64 id = 3;
 }
{code}


 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070019#comment-14070019
 ] 

Zhijie Shen commented on YARN-2319:
---

Almost good to me, but would you mind fixing the indent bellow? No tab please.
{code}
+   TestRMWebServicesDelegationTokens.class.getName() + -root);
+   testMiniKDC = new MiniKdc(MiniKdc.createConf(), testRootDir);
+   testMiniKDC.start();
+   testMiniKDC.createPrincipal(httpSpnegoKeytabFile, HTTP/localhost,
+  client, client2, client3);
{code}

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch, YARN-2319.1.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-22 Thread Yuliya Feldman (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070039#comment-14070039
 ] 

Yuliya Feldman commented on YARN-796:
-

To everybody that were so involved in providing input for last couple of days
I can provide support for App, Queue and Queue Label Policy Expression support.
 Also did some performance measurements - with 1000 entries with nodes and 
their labels it takes about additional 700 ms to process 1mln requests (hot 
cache). If will need reevaluate on every ResourceRequest within App performance 
will go down
This should cover 
{quote}
label-expressions support  (AND) only
app able to specify a label-expression when making a resource request - kind of 
(do per application at the moment, not per every resource request)
queues to AND augment the label expression with the queue label-expression
add support for OR and NOT to label-expressions
{quote}

As far as 
{quote}
RM has list of valid labels. (hot reloadable)
NMs have list of labels. (hot reloadable)
{quote}
With file in DFS you can get hot reloadable valid (unless somebody makes typo) 
labels on RM 

[~wangda] - How do you want to proceed here?

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-22 Thread Wenwu Peng (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenwu Peng updated YARN-2319:
-

Attachment: YARN-2319.2.patch

Thanks [~zjshen] comments. sorry, My IDE formatter have issues. Fix the format 
issue

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch, YARN-2319.1.patch, YARN-2319.2.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart

2014-07-22 Thread Nishan Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070073#comment-14070073
 ] 

Nishan Shetty commented on YARN-2262:
-

[~zjshen] 
{quote}How's your setup for RM restarting?{quote}
RM HA setup where active RM restarted gracefully

{quote}does the application continue after RM restarting?{quote}
Yes application continues after RM restart and finally application will be 
SUCCEEDED

{quote}If so, will the timeline server converge to show the missing fields 
correctly?{quote}
No timeline does not show correct fields even after application is SUCCEEDED

Thanks

 Few fields displaying wrong values in Timeline server after RM restart
 --

 Key: YARN-2262
 URL: https://issues.apache.org/jira/browse/YARN-2262
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Assignee: Naganarasimha G R

 Few fields displaying wrong values in Timeline server after RM restart
 State:null
 FinalStatus:  UNDEFINED
 Started:  8-Jul-2014 14:58:08
 Elapsed:  2562047397789hrs, 44mins, 47sec 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart

2014-07-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070080#comment-14070080
 ] 

Zhijie Shen commented on YARN-2262:
---

Then, it should be a bug. Would you mind sharing the RM and the timeline server 
log where the problem occurred?

 Few fields displaying wrong values in Timeline server after RM restart
 --

 Key: YARN-2262
 URL: https://issues.apache.org/jira/browse/YARN-2262
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Assignee: Naganarasimha G R

 Few fields displaying wrong values in Timeline server after RM restart
 State:null
 FinalStatus:  UNDEFINED
 Started:  8-Jul-2014 14:58:08
 Elapsed:  2562047397789hrs, 44mins, 47sec 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2330) Jobs are not displaying in timeline server after RM restart

2014-07-22 Thread Nishan Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070081#comment-14070081
 ] 

Nishan Shetty commented on YARN-2330:
-

Here RM has gone down abrupty

 Jobs are not displaying in timeline server after RM restart
 ---

 Key: YARN-2330
 URL: https://issues.apache.org/jira/browse/YARN-2330
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
 Environment: Nodemanagers 3 (3*8GB)
 Queues A = 70%
 Queues B = 30%
Reporter: Nishan Shetty

 Submit jobs to queue a
 While job is running Restart RM 
 Observe that those jobs are not displayed in timelineserver
 {code}
 2014-07-22 10:11:32,084 ERROR 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  History information of application application_1406002968974_0003 is not 
 included into the result due to the exception
 java.io.IOException: Cannot seek to negative offset
   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1381)
   at 
 org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63)
   at org.apache.hadoop.io.file.tfile.BCFile$Reader.init(BCFile.java:624)
   at org.apache.hadoop.io.file.tfile.TFile$Reader.init(TFile.java:804)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileReader.init(FileSystemApplicationHistoryStore.java:683)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getHistoryFileReader(FileSystemApplicationHistoryStore.java:661)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getApplication(FileSystemApplicationHistoryStore.java:146)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getAllApplications(FileSystemApplicationHistoryStore.java:199)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAllApplications(ApplicationHistoryManagerImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:75)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:156)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1192)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 

[jira] [Commented] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070090#comment-14070090
 ] 

Zhijie Shen commented on YARN-2319:
---

+1. Will commit it once jenkins +1 as well.

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch, YARN-2319.1.patch, YARN-2319.2.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2301) Improve yarn container command

2014-07-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070095#comment-14070095
 ] 

Naganarasimha G R commented on YARN-2301:
-

bq. How about having -list only, and then parsing whether the given id is app 
id or app attempt id?
Coding wise i dont see any troubles for this approach but CLI design wise not 
sure whether a command can take either one of the params [~jianhe] can you 
please share your thoughts on this too.


 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability

 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2330) Jobs are not displaying in timeline server after RM restart

2014-07-22 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-2330:
---

Assignee: Naganarasimha G R

 Jobs are not displaying in timeline server after RM restart
 ---

 Key: YARN-2330
 URL: https://issues.apache.org/jira/browse/YARN-2330
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
 Environment: Nodemanagers 3 (3*8GB)
 Queues A = 70%
 Queues B = 30%
Reporter: Nishan Shetty
Assignee: Naganarasimha G R

 Submit jobs to queue a
 While job is running Restart RM 
 Observe that those jobs are not displayed in timelineserver
 {code}
 2014-07-22 10:11:32,084 ERROR 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  History information of application application_1406002968974_0003 is not 
 included into the result due to the exception
 java.io.IOException: Cannot seek to negative offset
   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1381)
   at 
 org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63)
   at org.apache.hadoop.io.file.tfile.BCFile$Reader.init(BCFile.java:624)
   at org.apache.hadoop.io.file.tfile.TFile$Reader.init(TFile.java:804)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileReader.init(FileSystemApplicationHistoryStore.java:683)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getHistoryFileReader(FileSystemApplicationHistoryStore.java:661)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getApplication(FileSystemApplicationHistoryStore.java:146)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getAllApplications(FileSystemApplicationHistoryStore.java:199)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAllApplications(ApplicationHistoryManagerImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:75)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:156)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1192)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 

[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070126#comment-14070126
 ] 

Hudson commented on YARN-2045:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #620 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/620/])
YARN-2045. Data persisted in NM should be versioned. Contributed by Junping Du 
(jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612285)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records/NMDBSchemaVersion.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records/impl/pb
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records/impl/pb/NMDBSchemaVersionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java


 Data persisted in NM should be versioned
 

 Key: YARN-2045
 URL: https://issues.apache.org/jira/browse/YARN-2045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-2045-v2.patch, YARN-2045-v3.patch, 
 YARN-2045-v4.patch, YARN-2045-v5.patch, YARN-2045-v6.patch, 
 YARN-2045-v7.patch, YARN-2045.patch


 As a split task from YARN-667, we want to add version info to NM related 
 data, include:
 - NodeManager local LevelDB state
 - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2321) NodeManager web UI can incorrectly report Pmem enforcement

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070117#comment-14070117
 ] 

Hudson commented on YARN-2321:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #620 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/620/])
YARN-2321. NodeManager web UI can incorrectly report Pmem enforcement. 
Contributed by Leitao Guo (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612411)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java


 NodeManager web UI can incorrectly report Pmem enforcement
 --

 Key: YARN-2321
 URL: https://issues.apache.org/jira/browse/YARN-2321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Assignee: Leitao Guo
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-2321.patch


 WebUI of NodeManager get the wrong configuration of Pmem enforcement enable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2013) The diagnostics is always the ExitCodeException stack when the container crashes

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070119#comment-14070119
 ] 

Hudson commented on YARN-2013:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #620 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/620/])
YARN-2013. The diagnostics is always the ExitCodeException stack when the 
container crashes. (Contributed by Tsuyoshi OZAWA) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612449)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java


 The diagnostics is always the ExitCodeException stack when the container 
 crashes
 

 Key: YARN-2013
 URL: https://issues.apache.org/jira/browse/YARN-2013
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2013.1.patch, YARN-2013.2.patch, 
 YARN-2013.3-2.patch, YARN-2013.3.patch, YARN-2013.4.patch, YARN-2013.5.patch


 When a container crashes, ExitCodeException will be thrown from Shell. 
 Default/LinuxContainerExecutor captures the exception, put the exception 
 stack into the diagnostic. Therefore, the exception stack is always the same. 
 {code}
 String diagnostics = Exception from container-launch: \n
 + StringUtils.stringifyException(e) + \n + shExec.getOutput();
 container.handle(new ContainerDiagnosticsUpdateEvent(containerId,
 diagnostics));
 {code}
 In addition, it seems that the exception always has a empty message as 
 there's no message from stderr. Hence the diagnostics is not of much use for 
 users to analyze the reason of container crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to format the RMStateStore

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070118#comment-14070118
 ] 

Hudson commented on YARN-2131:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #620 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/620/])
YARN-2131. Addendum. Add a way to format the RMStateStore. (Robert Kanter via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612443)
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070161#comment-14070161
 ] 

Hadoop QA commented on YARN-2319:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657099/YARN-2319.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher
  
org.apache.hadoop.yarn.server.resourcemanager.monitor.TestSchedulingMonitor
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  org.apache.hadoop.yarn.server.resourcemanager.TestRM
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4393//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4393//console

This message is automatically generated.

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch, YARN-2319.1.patch, YARN-2319.2.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2247) Allow RM web services users to authenticate using delegation tokens

2014-07-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070165#comment-14070165
 ] 

Zhijie Shen commented on YARN-2247:
---

[~vvasudev], thanks for your patience on my comments. The new patch looks 
almost good to me. Just some nits:

1. Should not be necessary. Always load TimelineAuthenticationFilter. With 
simple type, still the pseudo handler is to used.
{code}
+if (authType.equals(simple)  
!UserGroupInformation.isSecurityEnabled()) {
+  container.addFilter(authentication,
+AuthenticationFilter.class.getName(), filterConfig);
+  return;
+}
{code}

2. Check not null first for testMiniKDC and rm? Same for 
TestRMWebappAuthentication
{code}
+testMiniKDC.stop();
+rm.stop();
{code}

3. I didn't find the logic to forbid it. Anyway, is it good to mention it in 
the document as well?
{code}
+  // Test to make sure that we can't do delegation token
+  // functions using just delegation token auth
{code}

 Allow RM web services users to authenticate using delegation tokens
 ---

 Key: YARN-2247
 URL: https://issues.apache.org/jira/browse/YARN-2247
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-2247.0.patch, apache-yarn-2247.1.patch, 
 apache-yarn-2247.2.patch, apache-yarn-2247.3.patch


 The RM webapp should allow users to authenticate using delegation tokens to 
 maintain parity with RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-22 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2319:
--

Attachment: YARN-2319.2.patch

Trigger the jenkins again

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch, YARN-2319.1.patch, YARN-2319.2.patch, 
 YARN-2319.2.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070223#comment-14070223
 ] 

Hadoop QA commented on YARN-2319:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657115/YARN-2319.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4394//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4394//console

This message is automatically generated.

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Attachments: YARN-2319.0.patch, YARN-2319.1.patch, YARN-2319.2.patch, 
 YARN-2319.2.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2242) Improve exception information on AM launch crashes

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070239#comment-14070239
 ] 

Hudson commented on YARN-2242:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5934 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5934/])
YARN-2242. Addendum patch. Improve exception information on AM launch crashes. 
(Contributed by Li Lu) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612565)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Improve exception information on AM launch crashes
 --

 Key: YARN-2242
 URL: https://issues.apache.org/jira/browse/YARN-2242
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2242-070115-2.patch, YARN-2242-070814-1.patch, 
 YARN-2242-070814.patch, YARN-2242-071114.patch, YARN-2242-071214.patch, 
 YARN-2242-071414.patch


 Now on each time AM Container crashes during launch, both the console and the 
 webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
 but sometimes confusing. With the help of log aggregator, container logs are 
 actually aggregated, and can be very helpful for debugging. One possible way 
 to improve the whole process is to send a pointer to the aggregated logs to 
 the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to format the RMStateStore

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070279#comment-14070279
 ] 

Hudson commented on YARN-2131:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1812 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1812/])
YARN-2131. Addendum. Add a way to format the RMStateStore. (Robert Kanter via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612443)
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2270) TestFSDownload#testDownloadPublicWithStatCache fails in trunk

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070281#comment-14070281
 ] 

Hudson commented on YARN-2270:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1812 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1812/])
YARN-2270. Made TestFSDownload#testDownloadPublicWithStatCache be skipped when 
there’s no ancestor permissions. Contributed by Akira Ajisaka. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612460)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java


 TestFSDownload#testDownloadPublicWithStatCache fails in trunk
 -

 Key: YARN-2270
 URL: https://issues.apache.org/jira/browse/YARN-2270
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.4.1
Reporter: Ted Yu
Assignee: Akira AJISAKA
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2270.2.patch, YARN-2270.patch


 From https://builds.apache.org/job/Hadoop-yarn-trunk/608/console :
 {code}
 Running org.apache.hadoop.yarn.util.TestFSDownload
 Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.955 sec  
 FAILURE! - in org.apache.hadoop.yarn.util.TestFSDownload
 testDownloadPublicWithStatCache(org.apache.hadoop.yarn.util.TestFSDownload)  
 Time elapsed: 0.137 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.util.TestFSDownload.testDownloadPublicWithStatCache(TestFSDownload.java:363)
 {code}
 Similar error can be seen here: 
 https://builds.apache.org/job/PreCommit-YARN-Build/4243//testReport/org.apache.hadoop.yarn.util/TestFSDownload/testDownloadPublicWithStatCache/
 Looks like future.get() returned null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2321) NodeManager web UI can incorrectly report Pmem enforcement

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070278#comment-14070278
 ] 

Hudson commented on YARN-2321:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1812 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1812/])
YARN-2321. NodeManager web UI can incorrectly report Pmem enforcement. 
Contributed by Leitao Guo (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612411)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java


 NodeManager web UI can incorrectly report Pmem enforcement
 --

 Key: YARN-2321
 URL: https://issues.apache.org/jira/browse/YARN-2321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Assignee: Leitao Guo
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-2321.patch


 WebUI of NodeManager get the wrong configuration of Pmem enforcement enable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-07-22 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-2331:


 Summary: Distinguish shutdown during supervision vs. shutdown for 
rolling upgrade
 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe


When the NM is shutting down with restart support enabled there are scenarios 
we'd like to distinguish and behave accordingly:

# The NM is running under supervision.  In that case containers should be 
preserved so the automatic restart can recover them.
# The NM is not running under supervision and a rolling upgrade is not being 
performed.  In that case the shutdown should kill all containers since it is 
unlikely the NM will be restarted in a timely manner to recover them.
# The NM is not running under supervision and a rolling upgrade is being 
performed.  In that case the shutdown should not kill all containers since a 
restart is imminent due to the rolling upgrade and the containers will be 
recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2321) NodeManager web UI can incorrectly report Pmem enforcement

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070350#comment-14070350
 ] 

Hudson commented on YARN-2321:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1839 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1839/])
YARN-2321. NodeManager web UI can incorrectly report Pmem enforcement. 
Contributed by Leitao Guo (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612411)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NodePage.java


 NodeManager web UI can incorrectly report Pmem enforcement
 --

 Key: YARN-2321
 URL: https://issues.apache.org/jira/browse/YARN-2321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Assignee: Leitao Guo
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-2321.patch


 WebUI of NodeManager get the wrong configuration of Pmem enforcement enable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2013) The diagnostics is always the ExitCodeException stack when the container crashes

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070352#comment-14070352
 ] 

Hudson commented on YARN-2013:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1839 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1839/])
YARN-2013. The diagnostics is always the ExitCodeException stack when the 
container crashes. (Contributed by Tsuyoshi OZAWA) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612449)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutorWithMocks.java


 The diagnostics is always the ExitCodeException stack when the container 
 crashes
 

 Key: YARN-2013
 URL: https://issues.apache.org/jira/browse/YARN-2013
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Fix For: 2.6.0

 Attachments: YARN-2013.1.patch, YARN-2013.2.patch, 
 YARN-2013.3-2.patch, YARN-2013.3.patch, YARN-2013.4.patch, YARN-2013.5.patch


 When a container crashes, ExitCodeException will be thrown from Shell. 
 Default/LinuxContainerExecutor captures the exception, put the exception 
 stack into the diagnostic. Therefore, the exception stack is always the same. 
 {code}
 String diagnostics = Exception from container-launch: \n
 + StringUtils.stringifyException(e) + \n + shExec.getOutput();
 container.handle(new ContainerDiagnosticsUpdateEvent(containerId,
 diagnostics));
 {code}
 In addition, it seems that the exception always has a empty message as 
 there's no message from stderr. Hence the diagnostics is not of much use for 
 users to analyze the reason of container crash.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070359#comment-14070359
 ] 

Hudson commented on YARN-2045:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1839 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1839/])
YARN-2045. Data persisted in NM should be versioned. Contributed by Junping Du 
(jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612285)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records/NMDBSchemaVersion.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records/impl
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records/impl/pb
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/records/impl/pb/NMDBSchemaVersionPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/proto/yarn_server_nodemanager_recovery.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java


 Data persisted in NM should be versioned
 

 Key: YARN-2045
 URL: https://issues.apache.org/jira/browse/YARN-2045
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-2045-v2.patch, YARN-2045-v3.patch, 
 YARN-2045-v4.patch, YARN-2045-v5.patch, YARN-2045-v6.patch, 
 YARN-2045-v7.patch, YARN-2045.patch


 As a split task from YARN-667, we want to add version info to NM related 
 data, include:
 - NodeManager local LevelDB state
 - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to format the RMStateStore

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070351#comment-14070351
 ] 

Hudson commented on YARN-2131:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1839 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1839/])
YARN-2131. Addendum. Add a way to format the RMStateStore. (Robert Kanter via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612443)
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070383#comment-14070383
 ] 

Zhijie Shen commented on YARN-2319:
---

Committed the patch to trunk, branch-2, and branch-2.5. Thanks, [~gujilangzi]!

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 3.0.0, 2.5.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Fix For: 2.5.0

 Attachments: YARN-2319.0.patch, YARN-2319.1.patch, YARN-2319.2.patch, 
 YARN-2319.2.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2319) Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java

2014-07-22 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2319:
--

 Target Version/s: 2.5.0  (was: 2.6.0)
Affects Version/s: (was: 2.6.0)
   2.5.0
   3.0.0

 Fix MiniKdc not close in TestRMWebServicesDelegationTokens.java
 ---

 Key: YARN-2319
 URL: https://issues.apache.org/jira/browse/YARN-2319
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Affects Versions: 3.0.0, 2.5.0
Reporter: Wenwu Peng
Assignee: Wenwu Peng
 Fix For: 2.5.0

 Attachments: YARN-2319.0.patch, YARN-2319.1.patch, YARN-2319.2.patch, 
 YARN-2319.2.patch


 MiniKdc only invoke start method not stop in 
 TestRMWebServicesDelegationTokens.java
 {code}
 testMiniKDC.start();
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-07-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070501#comment-14070501
 ] 

Jian He commented on YARN-2229:
---

Copied from protobuf guide, changing int32 to int64 seems to be compatible
{code}
int32, uint32, int64, uint64, and bool are all compatible – this means you can 
change a field from one of these types to another without breaking forwards- or 
backwards-compatibility. If a number is parsed from the wire which doesn't fit 
in the corresponding type, you will get the same effect as if you had cast the 
number to that type in C++ (e.g. if a 64-bit number is read as an int32, it 
will be truncated to 32 bits).
{code}

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2301) Improve yarn container command

2014-07-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070512#comment-14070512
 ] 

Jian He commented on YARN-2301:
---

bq. How about having -list only, and then parsing whether the given id is app 
id or app attempt id?
I think it's fine to just have -list only.
bq.  is it able to show the containers of previous app attempt, or the finished 
containers of the current app attempt?
finished containers are removed from schedulers. [~Naganarasimha], let's leave 
4) separately as it involves more changes and discussion. could you post your 
patch which fixed the first 3 ? thanks!

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability

 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart

2014-07-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070541#comment-14070541
 ] 

Zhijie Shen commented on YARN-2229:
---

bq. Copied from protobuf guide, changing int32 to int64 seems to be compatible

sounds good.

bq. The problem that I can think of is RM and NM versions are out of sync.

For example, ContainerTokenIdentifier serializes a long (getContainerId()) at 
RM side, but deserializes a int (getId()) at NM side. In this case, I'm afraid 
it's going to be wrong

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, 
 YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, 
 YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2131) Add a way to format the RMStateStore

2014-07-22 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2131:


Attachment: YARN-2131_addendum2.patch

 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch, YARN-2131_addendum2.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2295) Refactor YARN distributed shell with existing public stable API

2014-07-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070559#comment-14070559
 ] 

Jian He commented on YARN-2295:
---

patch looks good, +1

 Refactor YARN distributed shell with existing public stable API
 ---

 Key: YARN-2295
 URL: https://issues.apache.org/jira/browse/YARN-2295
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Attachments: TEST-YARN-2295-071514.patch, YARN-2295-071514-1.patch, 
 YARN-2295-071514.patch, YARN-2295-072114.patch


 Some API calls in YARN distributed shell have been marked as unstable and 
 private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (YARN-2317) Update documentation about how to write YARN applications

2014-07-22 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reopened YARN-2317:
---


Commented on the wrong jira.. reopen this.

 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2295) Refactor YARN distributed shell with existing public stable API

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070587#comment-14070587
 ] 

Hudson commented on YARN-2295:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5939 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5939/])
YARN-2295. Refactored DistributedShell to use public APIs of protocol records. 
Contributed by Li Lu (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612626)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java


 Refactor YARN distributed shell with existing public stable API
 ---

 Key: YARN-2295
 URL: https://issues.apache.org/jira/browse/YARN-2295
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: TEST-YARN-2295-071514.patch, YARN-2295-071514-1.patch, 
 YARN-2295-071514.patch, YARN-2295-072114.patch


 Some API calls in YARN distributed shell have been marked as unstable and 
 private. Use existing public stable API to replace them, if possible. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to format the RMStateStore

2014-07-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070611#comment-14070611
 ] 

Karthik Kambatla commented on YARN-2131:


My bad. Should have caught that in my earlier review.

+1 for the second addendum. Committing it. 

 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch, YARN-2131_addendum2.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2131) Add a way to format the RMStateStore

2014-07-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2131.


Resolution: Fixed

Committed addendum-2 to trunk and branch-2. 

 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch, YARN-2131_addendum2.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2301) Improve yarn container command

2014-07-22 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070632#comment-14070632
 ] 

Devaraj K commented on YARN-2301:
-

Here it is trying allocate this memory for heap size. And we need to leave the 
remaing memory for launching the Child container java process, native memory, 
etc.

As [~jianhe] mentioned, RM removes the completed containers or containers for 
completed attempts. I think there would not be much useful by providing 
completed appAttemptId for -list param and displaying some message or empty 
result.

I would think of giving appId option for -list (i.e. -list appId) and print 
the containers running for current application attempt.

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability

 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to format the RMStateStore

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070638#comment-14070638
 ] 

Hudson commented on YARN-2131:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5940 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5940/])
YARN-2131. Addendum2: Document -format-state-store. Add a way to format the 
RMStateStore. (Robert Kanter via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612634)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch, YARN-2131_addendum2.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2332) Create REST interface for app submission

2014-07-22 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2332:
---

Summary: Create REST interface for app submission  (was: Create service 
interface for app submission)

 Create REST interface for app submission
 

 Key: YARN-2332
 URL: https://issues.apache.org/jira/browse/YARN-2332
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager, webapp
Reporter: Jeff Hammerbacher

 Porting a discussion from the LinkedIn Hadoop group to the Hadoop JIRA: 
 http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=gid=988957discussionID=2156671sik=1239077959330



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (YARN-2332) Create service interface for app submission

2014-07-22 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-454 to YARN-2332:
--

  Component/s: (was: mrv2)
   webapp
   resourcemanager
Affects Version/s: (was: 0.23.0)
  Key: YARN-2332  (was: MAPREDUCE-454)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Create service interface for app submission
 ---

 Key: YARN-2332
 URL: https://issues.apache.org/jira/browse/YARN-2332
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager, webapp
Reporter: Jeff Hammerbacher

 Porting a discussion from the LinkedIn Hadoop group to the Hadoop JIRA: 
 http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=gid=988957discussionID=2156671sik=1239077959330



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1342) Recover container tokens upon nodemanager restart

2014-07-22 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1342:
-

Attachment: YARN-1342v5.patch

Thanks for the review, Devaraj!

bq. I think we can get the state from nmStore inside recover() instead of 
getting as an argument.

I fixed this for NMContainerTokenSecretManager and NMTokenSecretManagerInNM.

bq. Here e.getMessage() may not be required to pass as message since we are 
wrapping the same exception.

I originally used the (message, throwable) form because the resulting exception 
message is subtly different than just passing the throwable. 
org.fusesource.leveldbjni.internal.JniDB converts exceptions into DBException 
using the (message, throwable) form, and I was trying to be consistent.  
However I don't think it really matters that much what the message is, so I 
went ahead and changed all the conversions from DBException to IOException to 
just use the throwable form.

bq. Can we move the CONTAINER_TOKENS_KEY_PREFIX.length() to outside of the 
while loop? 

I'm skeptical of this change assuming any decent JVM environment.  The 
String.length() method is just returning a member, and the JIT eats this kind 
of stuff up all the time.  I went ahead and made the change anyway, but let me 
know if I'm missing the motivations for it.

bq. Can we make the string container_ as a constant? 

Replaced it with ConverterUtils.CONTAINER_PREFIX as it's close enough in this 
context.

bq. What do you think of having the names like RecoveredContainerTokensState, 
loadContainerTokensState

Sounds good.  For consistency I also changed the corresponding class and 
methods for NM tokens.

 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch, YARN-1342v5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2301) Improve yarn container command

2014-07-22 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070730#comment-14070730
 ] 

Zhijie Shen commented on YARN-2301:
---

bq. I think there would not be much useful by providing completed appAttemptId 
for -list param and displaying some message or empty result.

As I mentioned before, the command is going to be applied to both RM and 
timeline server. The latter is going to record the completed containers. -list 
appId is able to list all the containers from the side of the timeline 
server, and I hope it could work. So how about doing this for -list 
appId|appAttemptId additional opts?

* appAttemptId: containers of a specific app attempt in RM/Timeline server (for 
the case of RM, it is likely to show empty container list, but it's fine and it 
is actually the current situation).
* appId with no additional opt: containers of the last(current) app attempt in 
RM/Timeline server
* appId with -last: containers of the last(current) app attempt in RM/Timeline 
server
*appId with -all: containers of all app attempts in RM/Timeline server

Does this make sense?


 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability

 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2333) RM UI : show pending containers at cluster level in the application/scheduler page

2014-07-22 Thread Ashwin Shankar (JIRA)
Ashwin Shankar created YARN-2333:


 Summary: RM UI : show pending containers at cluster level in the 
application/scheduler page
 Key: YARN-2333
 URL: https://issues.apache.org/jira/browse/YARN-2333
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Ashwin Shankar


It would be helpful if we could display pending containers at a cluster level 
to get an idea of how far behind we are with, say our ETL processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-22 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070770#comment-14070770
 ] 

Eric Payne commented on YARN-415:
-

{quote}
5. I think it's better to add a new method in SchedulerApplicationAttempt like 
getMemoryUtilization, which will only return memory/cpu seconds. We do this to 
prevent locking scheduling thread when showing application metrics on web UI.
 getMemoryUtilization will be used by 
RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running 
resource utilization. And used by 
SchedulerApplicationAttempt#getResourceUsageReport as well.

The MemoryUtilization class may contain two fields: 
runningContainerMemory(VCore)Seconds
{quote}

[~leftnoteasy], Thank you for your thorough analysis of this patch and for your 
detailed suggestions.

I am working through them, and I think they are pretty clear, but this one is a 
little confusing to me. If I understand correctly, suggestion number 5 is to 
create SchedulerApplicationAttempt#getMemoryUtilization  to be called from both 
SchedulerApplicationAttempt#getResourceUsageReport as well as 
RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds. 

Is that correct? If so, I have a couple of questions:

- RMAppAttempt can access the scheduler via the 'scheduler' variable, but that 
is of type YarnScheduler, which does not have all of the interfaces available 
that AbstractYarnScheduler has. Are you suggesting that I add the 
getMemoryUtilization method to the YarnScheduler interface? Or, are you 
suggesting that the RMAppAttempt#scheduler variable be cast-ed to 
AbstractYarnScheduler? Or, am I missing the point?
- When you say that a new class should be added called MemoryUtilization to be 
passed back to SchedulerApplicationAttempt#getResourceUsageReport, are you 
suggesting that that same structure should be added to 
ApplicationResourceUsageReport as a class variable in place of the current 
'long memorySeconds' and 'long vcoreSeconds'? If so, I am a little reluctant to 
do that, since that structure would have to be passed across the protobuf 
interface to the client. It's possible, but seems riskier than just adding 2 
longs to the API.

Thank you very much.
Eric Payne

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart

2014-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070772#comment-14070772
 ] 

Hadoop QA commented on YARN-1342:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657163/YARN-1342v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4395//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4395//console

This message is automatically generated.

 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch, YARN-1342v5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (YARN-2334) Document exit codes and their meanings used by linux task controller

2014-07-22 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer moved MAPREDUCE-1318 to YARN-2334:
---

Component/s: (was: documentation)
 documentation
   Assignee: (was: Anatoli Fomenko)
Key: YARN-2334  (was: MAPREDUCE-1318)
Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 Document exit codes and their meanings used by linux task controller
 

 Key: YARN-2334
 URL: https://issues.apache.org/jira/browse/YARN-2334
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Sreekanth Ramakrishnan
 Attachments: HADOOP-5912.1.patch, MAPREDUCE-1318.1.patch, 
 MAPREDUCE-1318.2.patch, MAPREDUCE-1318.patch


 Currently, linux task controller binary uses a set of exit code, which is not 
 documented. These should be documented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2334) Document exit codes and their meanings used by linux task controller

2014-07-22 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070801#comment-14070801
 ] 

Allen Wittenauer commented on YARN-2334:


Moving this to YARN.

The same problem appears to exist for container executor.

 Document exit codes and their meanings used by linux task controller
 

 Key: YARN-2334
 URL: https://issues.apache.org/jira/browse/YARN-2334
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Sreekanth Ramakrishnan
 Attachments: HADOOP-5912.1.patch, MAPREDUCE-1318.1.patch, 
 MAPREDUCE-1318.2.patch, MAPREDUCE-1318.patch


 Currently, linux task controller binary uses a set of exit code, which is not 
 documented. These should be documented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070818#comment-14070818
 ] 

Craig Welch commented on YARN-1994:
---

FYI, took a look at the changes between patch 3 and 5, it includes support for 
the ApplicationHistory service when an address is specified for it (when one is 
not specified it binds on all ports, but if it is, it won't bind without this 
change).  This impacts ApplicationHistoryClientService.java, 
ApplicationHistoryServer.java, and WebAppUtils.java.  The mapreduce 
configurations where not consolidated when the yarn one was, those are also 
consolidated in the .5 patch.  This impacts JHAdminConfig.java, 
MRWebAppUtil.java, HistoryClientService.java, and HSAdminServer.java.  Some 
redundant logic in AdminService.java was also removed.

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, 
 YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070838#comment-14070838
 ] 

Craig Welch commented on YARN-1994:
---

-re [~arpitagarwal] comments - [~mipoto] made the changes along the lines of 
those requested to AdminService.java in p5, the application master, client rm, 
history client, resource localization and resource tracker services all have 
changes to support the bind properties - I think in some cases the file names 
are different, but the services look to be covered - I changed WebAppUtils to 
use RPCUtil as suggested, TIMELINE_SERVICE_BIND_HOST was used by Milan in p5 
when he added support for that service.  I am attaching a .6 patch, which 
consists of Milan's .5 patch with the change to WebAppUtils to use RPCUtil, and 
a change to add a separate getRMWebAppBindURLWithoutScheme() method, to make 
sure there is no confusion about it's purpose, I returned 
getRMWebAppURLWithoutScheme() to it's earlier functionality, in case it is used 
by external code, so it will work properly (it should not have the bind logic). 
 All relatively small changes, but look to be worthwhile finishing.  In my 
interactive testing everything looks to be working properly.  [~arpitagarwal] 
(and [~mipoto], if you like), can you take one more quick look?

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, 
 YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-22 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1994:
--

Attachment: YARN-1994.6.patch

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, 
 YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-22 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch reassigned YARN-2008:
-

Assignee: Craig Welch  (was: Chen He)

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070859#comment-14070859
 ] 

Craig Welch commented on YARN-2008:
---

I believe we've resolved on [YARN-1198] to move forward with this change.  
[~wangda],  [~airbots], can you take a look at my patch then, and provide 
feedback?  Thanks...

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2273) NPE in ContinuousScheduling Thread crippled RM after DN flap

2014-07-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2273:
--

Attachment: YARN-2273-5.patch

new patch adds return in the continuous scheduling thread.

 NPE in ContinuousScheduling Thread crippled RM after DN flap
 

 Key: YARN-2273
 URL: https://issues.apache.org/jira/browse/YARN-2273
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.3.0, 2.4.1
 Environment: cdh5.0.2 wheezy
Reporter: Andy Skelton
 Attachments: YARN-2273-5.patch, YARN-2273-replayException.patch, 
 YARN-2273.patch, YARN-2273.patch, YARN-2273.patch, YARN-2273.patch


 One DN experienced memory errors and entered a cycle of rebooting and 
 rejoining the cluster. After the second time the node went away, the RM 
 produced this:
 {code}
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Application attempt appattempt_1404858438119_4352_01 released container 
 container_1404858438119_4352_01_04 on node: host: 
 node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
 available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
 memory:335872, vCores:328
 2014-07-09 21:47:36,571 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[ContinuousScheduling,5,main] threw an Exception.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
   at java.util.TimSort.sort(TimSort.java:203)
   at java.util.TimSort.sort(TimSort.java:173)
   at java.util.Arrays.sort(Arrays.java:659)
   at java.util.Collections.sort(Collections.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
   at java.lang.Thread.run(Thread.java:744)
 {code}
 A few cycles later YARN was crippled. The RM was running and jobs could be 
 submitted but containers were not assigned and no progress was made. 
 Restarting the RM resolved it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2304) Test*WebServices* fails intermittently

2014-07-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070874#comment-14070874
 ] 

Jason Lowe commented on YARN-2304:
--

I noticed many (all?) of the failures occurred on the Jenkins H5 build node.  
Checking that node I saw a hung RM test that originally started on July 13th, 
and it was holding onto the port that the NM web service tests wanted to use.  
I've killed that hung test, so hopefully things can continue to progress in the 
short-term.

 Test*WebServices* fails intermittently
 --

 Key: YARN-2304
 URL: https://issues.apache.org/jira/browse/YARN-2304
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: test-failure-log-RMWeb.txt


 TestNMWebService, TestRMWebService, and TestAMWebService get failed with 
 address already get bind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-22 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070875#comment-14070875
 ] 

Craig Welch commented on YARN-2008:
---

[~airbots], -re

{quote}
and if the application gets the low baseline headroom it will not be able to 
effectively use that greater capacity. 
{quote}

right, that's what the patch is intended to do, but the approach here is to 
only drop that when needed based on utilization - when utilization is not an 
issue, allow the maxcapacity logic to continue as today and let the AM use the 
additional available headroom 

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070880#comment-14070880
 ] 

Hadoop QA commented on YARN-1994:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657185/YARN-1994.6.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4396//console

This message is automatically generated.

 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, 
 YARN-1994.3.patch, YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-07-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070925#comment-14070925
 ] 

Jason Lowe commented on YARN-2331:
--

Another possible approach is to have the NM always try to cleanup containers on 
a shutdown when it is unsupervised.  If a rolling upgrade needs to be performed 
and thus containers need to be preserved, the NM would be killed without the 
chance to cleanup (e.g.: kill -9 to deliver a SIGKILL).  Upon restart the NM 
would recover the state from the state store and reacquire the containers.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-07-22 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070967#comment-14070967
 ] 

Li Lu commented on YARN-2314:
-

Hi [~jlowe], I'm interested in looking into the cache overflow side of this 
issue (Sorry about the last comment that I mistyped my keyboard and sent it 
out...). After checking your comments and the code, I think a quick fix would 
be, when adding a new proxy into the cache and the cache is full, instead of 
only relying on (and trying to delete) the least recently used item, the cache 
should keep checking through the whole list to find one item that is not being 
used by a RPC, and replace it at that place. There is one scenario that this 
may not actually help, and that would be the whole list of cached items are 
used by RPCs. I would like to check with you to see if this is a frequent case 
in your cluster, and if not, if this quick fix would work for the cache 
overflow problem. Thanks! 

 ContainerManagementProtocolProxy can create thousands of threads for a large 
 cluster
 

 Key: YARN-2314
 URL: https://issues.apache.org/jira/browse/YARN-2314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Priority: Critical

 ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
 this cache is configurable.  However the cache can grow far beyond the 
 configured size when running on a large cluster and blow AM address/container 
 limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2273) NPE in ContinuousScheduling Thread crippled RM after DN flap

2014-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070965#comment-14070965
 ] 

Hadoop QA commented on YARN-2273:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657187/YARN-2273-5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4397//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4397//console

This message is automatically generated.

 NPE in ContinuousScheduling Thread crippled RM after DN flap
 

 Key: YARN-2273
 URL: https://issues.apache.org/jira/browse/YARN-2273
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.3.0, 2.4.1
 Environment: cdh5.0.2 wheezy
Reporter: Andy Skelton
 Attachments: YARN-2273-5.patch, YARN-2273-replayException.patch, 
 YARN-2273.patch, YARN-2273.patch, YARN-2273.patch, YARN-2273.patch


 One DN experienced memory errors and entered a cycle of rebooting and 
 rejoining the cluster. After the second time the node went away, the RM 
 produced this:
 {code}
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Application attempt appattempt_1404858438119_4352_01 released container 
 container_1404858438119_4352_01_04 on node: host: 
 node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
 available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
 memory:335872, vCores:328
 2014-07-09 21:47:36,571 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[ContinuousScheduling,5,main] threw an Exception.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
   at java.util.TimSort.sort(TimSort.java:203)
   at java.util.TimSort.sort(TimSort.java:173)
   at java.util.Arrays.sort(Arrays.java:659)
   at java.util.Collections.sort(Collections.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
   at java.lang.Thread.run(Thread.java:744)
 {code}
 A few cycles later YARN was crippled. The RM was running and jobs could be 
 submitted but containers were not assigned and no progress was made. 
 Restarting the RM resolved it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-07-22 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2314:
-

Attachment: nmproxycachefix.prototype.patch

I was thinking along similar lines, but I am worried about the corner case 
where all RPCs are in use.  I think we need to handle this case even if it's 
rare.  An AM running on a node where it can see the RM but has a network cut to 
the rest of the cluster could go really bad really quick otherwise.  If we 
don't handle the corner case then we'll continue to grow the proxy cache beyond 
its boundaries as we do today, and that AM will explode with thousands of 
threads for what may be a temporary network outage.

While debugging this I wrote up a quick prototype patch to try to fix the cache 
so that it keeps the cache under the configured limit.  Attaching the patch for 
reference.  However as I mentioned above, simply keeping the NM proxy cache 
under its configured limit means nothing if we don't address the problems with 
connections remaining open in the IPC Client layer.

 ContainerManagementProtocolProxy can create thousands of threads for a large 
 cluster
 

 Key: YARN-2314
 URL: https://issues.apache.org/jira/browse/YARN-2314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Priority: Critical
 Attachments: nmproxycachefix.prototype.patch


 ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
 this cache is configurable.  However the cache can grow far beyond the 
 configured size when running on a large cluster and blow AM address/container 
 limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2335) Annotate all hadoop-sls APIs as @Private

2014-07-22 Thread Wei Yan (JIRA)
Wei Yan created YARN-2335:
-

 Summary: Annotate all hadoop-sls APIs as @Private
 Key: YARN-2335
 URL: https://issues.apache.org/jira/browse/YARN-2335
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2314) ContainerManagementProtocolProxy can create thousands of threads for a large cluster

2014-07-22 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071018#comment-14071018
 ] 

Li Lu commented on YARN-2314:
-

Thanks [~jlowe]! About the corner case, I'm wondering whether a bounded time 
waiting would be slightly better than waiting? In that way if a certain timeout 
is triggered then it means all RPCs are occupied for a really long time, and 
the system could report this abnormal situation. 

 ContainerManagementProtocolProxy can create thousands of threads for a large 
 cluster
 

 Key: YARN-2314
 URL: https://issues.apache.org/jira/browse/YARN-2314
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Priority: Critical
 Attachments: nmproxycachefix.prototype.patch


 ContainerManagementProtocolProxy has a cache of NM proxies, and the size of 
 this cache is configurable.  However the cache can grow far beyond the 
 configured size when running on a large cluster and blow AM address/container 
 limits.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2273) NPE in ContinuousScheduling thread when we lose a node

2014-07-22 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2273:
---

Summary: NPE in ContinuousScheduling thread when we lose a node  (was: NPE 
in ContinuousScheduling Thread crippled RM after DN flap)

 NPE in ContinuousScheduling thread when we lose a node
 --

 Key: YARN-2273
 URL: https://issues.apache.org/jira/browse/YARN-2273
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.3.0, 2.4.1
 Environment: cdh5.0.2 wheezy
Reporter: Andy Skelton
 Attachments: YARN-2273-5.patch, YARN-2273-replayException.patch, 
 YARN-2273.patch, YARN-2273.patch, YARN-2273.patch, YARN-2273.patch


 One DN experienced memory errors and entered a cycle of rebooting and 
 rejoining the cluster. After the second time the node went away, the RM 
 produced this:
 {code}
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Application attempt appattempt_1404858438119_4352_01 released container 
 container_1404858438119_4352_01_04 on node: host: 
 node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
 available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
 memory:335872, vCores:328
 2014-07-09 21:47:36,571 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[ContinuousScheduling,5,main] threw an Exception.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
   at java.util.TimSort.sort(TimSort.java:203)
   at java.util.TimSort.sort(TimSort.java:173)
   at java.util.Arrays.sort(Arrays.java:659)
   at java.util.Collections.sort(Collections.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
   at java.lang.Thread.run(Thread.java:744)
 {code}
 A few cycles later YARN was crippled. The RM was running and jobs could be 
 submitted but containers were not assigned and no progress was made. 
 Restarting the RM resolved it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2335) Annotate all hadoop-sls APIs as @Private

2014-07-22 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2335:
--

Attachment: YARN-2335-1.patch

 Annotate all hadoop-sls APIs as @Private
 

 Key: YARN-2335
 URL: https://issues.apache.org/jira/browse/YARN-2335
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Wei Yan
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2335-1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2273) NPE in ContinuousScheduling thread when we lose a node

2014-07-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071077#comment-14071077
 ] 

Hudson commented on YARN-2273:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5945 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5945/])
YARN-2273. NPE in ContinuousScheduling thread when we lose a node. (Wei Yan via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612720)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 NPE in ContinuousScheduling thread when we lose a node
 --

 Key: YARN-2273
 URL: https://issues.apache.org/jira/browse/YARN-2273
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.3.0, 2.4.1
 Environment: cdh5.0.2 wheezy
Reporter: Andy Skelton
Assignee: Wei Yan
 Fix For: 2.6.0

 Attachments: YARN-2273-5.patch, YARN-2273-replayException.patch, 
 YARN-2273.patch, YARN-2273.patch, YARN-2273.patch, YARN-2273.patch


 One DN experienced memory errors and entered a cycle of rebooting and 
 rejoining the cluster. After the second time the node went away, the RM 
 produced this:
 {code}
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Application attempt appattempt_1404858438119_4352_01 released container 
 container_1404858438119_4352_01_04 on node: host: 
 node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
 available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
 memory:335872, vCores:328
 2014-07-09 21:47:36,571 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[ContinuousScheduling,5,main] threw an Exception.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
   at java.util.TimSort.sort(TimSort.java:203)
   at java.util.TimSort.sort(TimSort.java:173)
   at java.util.Arrays.sort(Arrays.java:659)
   at java.util.Collections.sort(Collections.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
   at java.lang.Thread.run(Thread.java:744)
 {code}
 A few cycles later YARN was crippled. The RM was running and jobs could be 
 submitted but containers were not assigned and no progress was made. 
 Restarting the RM resolved it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue

2014-07-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071118#comment-14071118
 ] 

Tsuyoshi OZAWA commented on YARN-2313:
--

The test failure is not related.

 Livelock can occur on FairScheduler when there are lots entry in queue
 --

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, 
 YARN-2313.4.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2304) Test*WebServices* fails intermittently

2014-07-22 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071217#comment-14071217
 ] 

Tsuyoshi OZAWA commented on YARN-2304:
--

[~jlowe], thanks for your survey! After confirming things will go well, close 
this JIRA as fixed one.

 Test*WebServices* fails intermittently
 --

 Key: YARN-2304
 URL: https://issues.apache.org/jira/browse/YARN-2304
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: test-failure-log-RMWeb.txt


 TestNMWebService, TestRMWebService, and TestAMWebService get failed with 
 address already get bind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart

2014-07-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071272#comment-14071272
 ] 

Junping Du commented on YARN-1342:
--

Thanks [~jlowe] for updating the patch. 
I'd like to continue our practice in YARN-1341 with initiate a question What 
would happen if failed to update currentMasterKey and previousMasterKey for 
NMContainerTokenSecretManager?
My tentative answer is: staled currentMasterKey won't break anything as it will 
get updated in registering RM during restart. staled previousMasterKey will 
make AM that hold original previousMasterKey failed to start container. Would 
you confirm my understanding is correct? If so, the following code may not be 
necessary?
{code}
+// if there was no master key, try the previous key
+if (super.currentMasterKey == null) {
+  super.currentMasterKey = previousMasterKey;
+}
{code}

 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch, YARN-1342v4.patch, YARN-1342v5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree

2014-07-22 Thread Kenji Kikushima (JIRA)
Kenji Kikushima created YARN-2336:
-

 Summary: Fair scheduler REST api returns a missing '[' bracket 
JSON for deep queue tree
 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima


When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
blacket JSON for childQueues.
This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree

2014-07-22 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-2336:
--

Attachment: YARN-2336.patch

Attached a patch.
To notice childQueues as a collection to JSONJAXBContext, this patch introduces 
FairSchedulerQueueInfoList.
I refered to MAPREDUCE-4020.

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
 Attachments: YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1050) Document the Fair Scheduler REST API

2014-07-22 Thread Kenji Kikushima (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071289#comment-14071289
 ] 

Kenji Kikushima commented on YARN-1050:
---

Hi [~ajisakaa], created YARN-2336 for missing '[' blacket issue. Please comment 
if you have interest. Thanks!

 Document the Fair Scheduler REST API
 

 Key: YARN-1050
 URL: https://issues.apache.org/jira/browse/YARN-1050
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Sandy Ryza
Assignee: Kenji Kikushima
 Attachments: YARN-1050-2.patch, YARN-1050-3.patch, YARN-1050.patch


 The documentation should be placed here along with the Capacity Scheduler 
 documentation: 
 http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Scheduler_API



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2215) Add preemption info to REST/CLI

2014-07-22 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-2215:
--

Attachment: YARN-2215.patch

Hi [~leftnoteasy], I found Resource Preempted from Current Attempt and 
Number of Non-AM Containers Preempted from Current Attempt in Web UI are not 
implemented in REST apps api.
For api consistency, I think REST should have same elements. Please comment if 
you have interest, Thanks.

 Add preemption info to REST/CLI
 ---

 Key: YARN-2215
 URL: https://issues.apache.org/jira/browse/YARN-2215
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Reporter: Wangda Tan
 Attachments: YARN-2215.patch


 As discussed in YARN-2181, we'd better to add preemption info to RM RESTful 
 API/CLI to make administrator/user get more understanding about preemption 
 happened on app/queue, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2336) Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree

2014-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071324#comment-14071324
 ] 

Hadoop QA commented on YARN-2336:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12657259/YARN-2336.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4398//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4398//console

This message is automatically generated.

 Fair scheduler REST api returns a missing '[' bracket JSON for deep queue tree
 --

 Key: YARN-2336
 URL: https://issues.apache.org/jira/browse/YARN-2336
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
 Attachments: YARN-2336.patch


 When we have sub queues in Fair Scheduler, REST api returns a missing '[' 
 blacket JSON for childQueues.
 This issue found by [~ajisakaa] at YARN-1050.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2328) FairScheduler: Verify update and continuous scheduling threads are stopped when the scheduler is stopped

2014-07-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071361#comment-14071361
 ] 

Sandy Ryza commented on YARN-2328:
--

{code}
-if (node != null  Resources.fitsIn(minimumAllocation,
-node.getAvailableResource())) {
+if (node != null 
+Resources.fitsIn(minimumAllocation, node.getAvailableResource())) {
{code}
This looks unrelated.

+1 otherwise.

 FairScheduler: Verify update and continuous scheduling threads are stopped 
 when the scheduler is stopped
 

 Key: YARN-2328
 URL: https://issues.apache.org/jira/browse/YARN-2328
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Minor
 Attachments: yarn-2328-1.patch


 FairScheduler threads can use a little cleanup and tests. To begin with, the 
 update and continuous-scheduling threads should extend Thread and handle 
 being interrupted. We should have tests for starting and stopping them as 
 well. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2313) Livelock can occur in FairScheduler when there are lots of running apps

2014-07-22 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-2313:
-

Summary: Livelock can occur in FairScheduler when there are lots of running 
apps  (was: Livelock can occur on FairScheduler when there are lots of running 
apps)

 Livelock can occur in FairScheduler when there are lots of running apps
 ---

 Key: YARN-2313
 URL: https://issues.apache.org/jira/browse/YARN-2313
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, 
 YARN-2313.4.patch, rm-stack-trace.txt


 Observed livelock on FairScheduler when there are lots entry in queue. After 
 my investigating code, following case can occur:
 1. {{update()}} called by UpdateThread takes longer times than 
 UPDATE_INTERVAL(500ms) if there are lots queue.
 2. UpdateThread goes busy loop.
 3. Other threads(AllocationFileReloader, 
 ResourceManager$SchedulerEventDispatcher) can wait forever.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2301) Improve yarn container command

2014-07-22 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071395#comment-14071395
 ] 

Devaraj K commented on YARN-2301:
-

Thanks [~zjshen] for the clarification. 

+1 for the above approach.


 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability

 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)