from:"Tsuyoshi OZAWA \(JIRA\)"

[jira] [Assigned] (YARN-7443) Add native FPGA module support to do isolation with cgroups

2017-11-05 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa reassigned YARN-7443:


Assignee: Zhankun Tang

> Add native FPGA module support to do isolation with cgroups
> ---
>
> Key: YARN-7443
> URL: https://issues.apache.org/jira/browse/YARN-7443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
> Attachments: YARN-7443-trunk.001.patch
>
>
> Only support one major number devices configured in c-e.cfg for now. So 
> almost same with GPU native module



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2017-01-09 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813915#comment-15813915
 ] 

Tsuyoshi Ozawa commented on YARN-3774:
--

Thanks Jordan for the notification! I think we should use 3.3.0, 2.12.0 or 
later.

> ZKRMStateStore should use Curator 3.0 and avail CuratorOp
> -
>
> Key: YARN-3774
> URL: https://issues.apache.org/jira/browse/YARN-3774
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
>
> YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
> somewhat involved, and could be improved using CuratorOp introduced in 
> Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
> and make this change. 
> Curator is considering shading guava through CURATOR-200. In Hadoop 3, we 
> should upgrade to the next Curator version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2017-01-05 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803169#comment-15803169
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Yes, Jian is correct.

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.7.2, 2.6.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5801) Adding isRoot method to CSQueue

2016-10-28 Thread Tsuyoshi Ozawa (JIRA)

Tsuyoshi Ozawa created YARN-5801:


 Summary: Adding isRoot method to CSQueue
 Key: YARN-5801
 URL: https://issues.apache.org/jira/browse/YARN-5801
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Tsuyoshi Ozawa


Currently, we check whether CSQueue is root or not by using null check against 
getParent. It's more straightforward to introduce isRoot to a method in CSQueue 
instead of going to current way. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5801) Adding isRoot method to CSQueue

2016-10-28 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-5801:
-
Description: Currently, we check whether CSQueue is root or not by using 
null check against the return value of getParent. It's more straightforward to 
introduce isRoot method to CSQueue instead of going to current way.   (was: 
Currently, we check whether CSQueue is root or not by using null check against 
getParent. It's more straightforward to introduce isRoot to a method in CSQueue 
instead of going to current way. )

> Adding isRoot method to CSQueue
> ---
>
> Key: YARN-5801
> URL: https://issues.apache.org/jira/browse/YARN-5801
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>
> Currently, we check whether CSQueue is root or not by using null check 
> against the return value of getParent. It's more straightforward to introduce 
> isRoot method to CSQueue instead of going to current way. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-3538) TimelineServer doesn't catch/translate all exceptions raised

2016-10-28 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3538:
-
Attachment: YARN-3538.002.patch

Updating the patch based on the discussion.

[~djp] could you check the patch?

> TimelineServer doesn't catch/translate all exceptions raised
> 
>
> Key: YARN-3538
> URL: https://issues.apache.org/jira/browse/YARN-3538
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: oct16-easy
> Attachments: YARN-3538-001.patch, YARN-3538.002.patch
>
>
> Not all exceptions in TimelineServer are uprated to web exceptions; only IOEs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2016-10-28 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617307#comment-15617307
 ] 

Tsuyoshi Ozawa commented on YARN-2674:
--

[~chenchun] The patch seems to be stale now. Could you update it?

> Distributed shell AM may re-launch containers if RM work preserving restart 
> happens
> ---
>
> Key: YARN-2674
> URL: https://issues.apache.org/jira/browse/YARN-2674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: applications, resourcemanager
>Reporter: Chun Chen
>Assignee: Chun Chen
>  Labels: oct16-easy
> Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch, 
> YARN-2674.4.patch, YARN-2674.5.patch
>
>
> Currently, if RM work preserving restart happens while distributed shell is 
> running, distribute shell AM may re-launch all the containers, including 
> new/running/complete. We must make sure it won't re-launch the 
> running/complete containers.
> We need to remove allocated containers from 
> AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2467) Add SpanReceiverHost to ResourceManager

2016-10-28 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617301#comment-15617301
 ] 

Tsuyoshi Ozawa commented on YARN-2467:
--

[~iwasakims] could you rebase it on trunk code? It cannot be applied to trunk.

> Add SpanReceiverHost to ResourceManager
> ---
>
> Key: YARN-2467
> URL: https://issues.apache.org/jira/browse/YARN-2467
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>  Labels: oct16-easy
> Attachments: YARN-2467.001.patch, YARN-2467.002.patch
>
>
> Per process SpanReceiverHost should be initialized in ResourceManager in the 
> same way as NameNode and DataNode do in order to support tracing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5746) The state of the parentQueue and its childQueues should be synchronized.

2016-10-28 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617279#comment-15617279
 ] 

Tsuyoshi Ozawa commented on YARN-5746:
--

[~xgong] thanks for taking this issue.

{code}
  public QueueState getConfiguredState(String queue) {
String state = get(getQueuePrefix(queue) + STATE);
if (state == null) {
  return null;
} else {
  return QueueState.valueOf(StringUtils.toUpperCase(state));
}
{code}

It's a bit difficult to understand what the state of "null" mean.
I would like to suggest that we create new state, QueueState.NOT_FOUND, and 
return it instead of returning null. What do you think?

{quote}
Let's collapse these nested conditionals into an else if:
{quote}
+1

In addition to Daniel's comments, how about adding new private method to wrap 
up the following routine?
{code}
  if (parent != null) {
QueueState configuredState = csContext.getConfiguration()
.getConfiguredState(getQueuePath());
QueueState parentState = parent.getState();
if (configuredState == null) {
  this.state = parentState;
} else {
  if (configuredState == QueueState.RUNNING
  && parentState == QueueState.STOPPED) {
throw new IllegalArgumentException(
"Illegal" + " State of " + configuredState
+ " for children of queue: " + queueName
+ ". The state of its parent queue: " + parent.getQueueName()
+ " is " + parentState);
  } else {
this.state = configuredState;
  }
}
} else {
// if this is the root queue, get the state from the configuration.
// if the state is not set, use RUNNING as default state.
this.state = csContext.getConfiguration().getState(getQueuePath());
  }
{code}


> The state of the parentQueue and its childQueues should be synchronized.
> 
>
> Key: YARN-5746
> URL: https://issues.apache.org/jira/browse/YARN-5746
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>  Labels: oct16-easy
> Attachments: YARN-5746.1.patch, YARN-5746.2.patch
>
>
> The state of the parentQueue and its childQeues need to be synchronized. 
> * If the state of the parentQueue becomes STOPPED, the state of its 
> childQueue need to become STOPPED as well. 
> * If we change the state of the queue to RUNNING, we should make sure the 
> state of all its ancestor must be RUNNING. Otherwise, we need to fail this 
> operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5259) Add two metrics at FSOpDurations for doing container assign and completed Performance statistical analysis

2016-10-28 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-5259:
-
Assignee: Inigo Goiri

> Add two metrics at FSOpDurations for doing container assign and completed 
> Performance statistical analysis
> --
>
> Key: YARN-5259
> URL: https://issues.apache.org/jira/browse/YARN-5259
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: ChenFolin
>Assignee: Inigo Goiri
>  Labels: oct16-easy
> Attachments: YARN-5259-001.patch, YARN-5259-002.patch, 
> YARN-5259-003.patch, YARN-5259-004.patch
>
>
> If cluster is slow , we can not know Whether it is caused by container assign 
> or completed performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3139) Improve locks in AbstractYarnScheduler/CapacityScheduler/FairScheduler

2016-09-22 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513703#comment-15513703
 ] 

Tsuyoshi Ozawa commented on YARN-3139:
--

[~leftnoteasy] [~jianhe] thanks for taking this issue. 

{quote}
Summary:
No regression in performance, didn't see deadlock happens.
No significant performance improvement either, because existing scheduler 
allocation is still in single thread.
{quote}

If the performance doesn't change, could you clarify the reason to change this? 
Do you plan to make the scheduler allocation multi-threaded? 

> Improve locks in AbstractYarnScheduler/CapacityScheduler/FairScheduler
> --
>
> Key: YARN-3139
> URL: https://issues.apache.org/jira/browse/YARN-3139
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3139.0.patch, YARN-3139.1.patch, YARN-3139.2.patch
>
>
> Enhance locks in AbstractYarnScheduler/CapacityScheduler/FairScheduler, as 
> mentioned in YARN-3091, a possible solution is using read/write lock. Other 
> fine-graind locks for specific purposes / bugs should be addressed in 
> separated tickets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4714) [Java 8] Over usage of virtual memory

2016-07-25 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393276#comment-15393276
 ] 

Tsuyoshi Ozawa commented on YARN-4714:
--

Hi Krishna, have you changed the configurations on all NodeManagers and restart 
all of them?

> [Java 8] Over usage of virtual memory
> -
>
> Key: YARN-4714
> URL: https://issues.apache.org/jira/browse/YARN-4714
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
>Priority: Blocker
> Attachments: HADOOP-11364.01.patch
>
>
> In our Hadoop 2 + Java8 effort , we found few jobs are being Killed by Hadoop 
> due to excessive virtual memory allocation.  Although the physical memory 
> usage is low.
> The most common error message is "Container [pid=??,containerID=container_??] 
> is running beyond virtual memory limits. Current usage: 365.1 MB of 1 GB 
> physical memory used; 3.2 GB of 2.1 GB virtual memory used. Killing 
> container."
> We see this problem for MR job as well as in spark driver/executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4048) Linux kernel panic under strict CPU limits(on CentOS/RHEL 6.x)

2016-07-25 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4048:
-
Summary: Linux kernel panic under strict CPU limits(on CentOS/RHEL 6.x)  
(was: Linux kernel panic under strict CPU limits)

> Linux kernel panic under strict CPU limits(on CentOS/RHEL 6.x)
> --
>
> Key: YARN-4048
> URL: https://issues.apache.org/jira/browse/YARN-4048
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Chengbing Liu
>Priority: Critical
> Attachments: panic.png
>
>
> With YARN-2440 and YARN-2531, we have seen some kernel panics happening under 
> heavy pressure. Even with YARN-2809, it still panics.
> We are using CentOS 6.5, hadoop 2.5.0-cdh5.2.0 with the above patches. I 
> guess the latest version also has the same issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5332) Jersey ClientResponse#getStatusInfo api is not available with jersey 1.9

2016-07-07 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366191#comment-15366191
 ] 

Tsuyoshi Ozawa commented on YARN-5332:
--

[~sunilg] How about executing {{mvn clean install test 
-Dtest=TestRMWebServices}}? It works on my local without cleaning 
jersey-client-1.9.jar. If it doesn't work, it might be useful to clean M2_REPO 
as ad-hoc solution on an empirical basis.

> Jersey ClientResponse#getStatusInfo api is not available with jersey 1.9
> 
>
> Key: YARN-5332
> URL: https://issues.apache.org/jira/browse/YARN-5332
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Sunil G
>Assignee: Sunil G
>
> Few test classes like TestRMWebServices, were using 
> ClientResponse#getStatusInfo and this api is not available as part of jersey 
> 1.9.
> Pls refer: 
> https://jersey.java.net/apidocs/1.9/jersey/com/sun/jersey/api/client/ClientResponse.html
> {{getStatusInfo}} is not present here.
> We may need to change such invocations from these test classes.
> In HADOOP-9613, [~ozawa] mentioned in this 
> [comment|https://issues.apache.org/jira/browse/HADOOP-9613?focusedCommentId=14980024=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14980024]
>  that we can use {{getStatusInfo}}.
> [~ozawa], could you please help to confirm this point Or am I missing some 
> thing here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5332) Jersey ClientResponse#getStatusInfo api is not available with jersey 1.9

2016-07-07 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366130#comment-15366130
 ] 

Tsuyoshi Ozawa edited comment on YARN-5332 at 7/7/16 1:51 PM:
--

[~sunilg] Thanks for reporting the issue. Please see [the doc of Jersey 
1.19|https://jersey.java.net/apidocs/1.19/jersey/com/sun/jersey/api/client/ClientResponse.html#getClientResponseStatus()],
 not one of 1.9 because we upgraded Jersey to 1.19. Feel free to ask me about 
the update of dependency.




was (Author: ozawa):
[~sunilg] Thanks for reporting the issue. Please see [the doc of Jersey 
1.19|https://jersey.java.net/apidocs/1.19/jersey/com/sun/jersey/api/client/ClientResponse.html#getClientResponseStatus()],
 not 1.9 because we upgraded Jersey to 1.19. Feel free to ask me about the 
update of dependency.



> Jersey ClientResponse#getStatusInfo api is not available with jersey 1.9
> 
>
> Key: YARN-5332
> URL: https://issues.apache.org/jira/browse/YARN-5332
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Sunil G
>Assignee: Sunil G
>
> Few test classes like TestRMWebServices, were using 
> ClientResponse#getStatusInfo and this api is not available as part of jersey 
> 1.9.
> Pls refer: 
> https://jersey.java.net/apidocs/1.9/jersey/com/sun/jersey/api/client/ClientResponse.html
> {{getStatusInfo}} is not present here.
> We may need to change such invocations from these test classes.
> In HADOOP-9613, [~ozawa] mentioned in this 
> [comment|https://issues.apache.org/jira/browse/HADOOP-9613?focusedCommentId=14980024=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14980024]
>  that we can use {{getStatusInfo}}.
> [~ozawa], could you please help to confirm this point Or am I missing some 
> thing here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-5332) Jersey ClientResponse#getStatusInfo api is not available with jersey 1.9

2016-07-07 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366130#comment-15366130
 ] 

Tsuyoshi Ozawa edited comment on YARN-5332 at 7/7/16 1:50 PM:
--

[~sunilg] Thanks for reporting the issue. Please see [the doc of Jersey 
1.19|https://jersey.java.net/apidocs/1.19/jersey/com/sun/jersey/api/client/ClientResponse.html#getClientResponseStatus()],
 not 1.9 because we upgraded Jersey to 1.19. Feel free to ask me about the 
update of dependency.




was (Author: ozawa):
[~sunilg] Thanks for reporting the issue. Please see the doc of Jersey 1.19, 
not 1.9 because we upgraded Jersey to 1.19. Feel free to ask me about the 
update of dependency.
https://jersey.java.net/apidocs/1.19/jersey/com/sun/jersey/api/client/ClientResponse.html#getClientResponseStatus()


> Jersey ClientResponse#getStatusInfo api is not available with jersey 1.9
> 
>
> Key: YARN-5332
> URL: https://issues.apache.org/jira/browse/YARN-5332
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Sunil G
>Assignee: Sunil G
>
> Few test classes like TestRMWebServices, were using 
> ClientResponse#getStatusInfo and this api is not available as part of jersey 
> 1.9.
> Pls refer: 
> https://jersey.java.net/apidocs/1.9/jersey/com/sun/jersey/api/client/ClientResponse.html
> {{getStatusInfo}} is not present here.
> We may need to change such invocations from these test classes.
> In HADOOP-9613, [~ozawa] mentioned in this 
> [comment|https://issues.apache.org/jira/browse/HADOOP-9613?focusedCommentId=14980024=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14980024]
>  that we can use {{getStatusInfo}}.
> [~ozawa], could you please help to confirm this point Or am I missing some 
> thing here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5332) Jersey ClientResponse#getStatusInfo api is not available with jersey 1.9

2016-07-07 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366130#comment-15366130
 ] 

Tsuyoshi Ozawa commented on YARN-5332:
--

[~sunilg] Thanks for reporting the issue. Please see the doc of Jersey 1.19, 
not 1.9 because we upgraded Jersey to 1.19. Feel free to ask me about the 
update of dependency.
https://jersey.java.net/apidocs/1.19/jersey/com/sun/jersey/api/client/ClientResponse.html#getClientResponseStatus()


> Jersey ClientResponse#getStatusInfo api is not available with jersey 1.9
> 
>
> Key: YARN-5332
> URL: https://issues.apache.org/jira/browse/YARN-5332
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Sunil G
>Assignee: Sunil G
>
> Few test classes like TestRMWebServices, were using 
> ClientResponse#getStatusInfo and this api is not available as part of jersey 
> 1.9.
> Pls refer: 
> https://jersey.java.net/apidocs/1.9/jersey/com/sun/jersey/api/client/ClientResponse.html
> {{getStatusInfo}} is not present here.
> We may need to change such invocations from these test classes.
> In HADOOP-9613, [~ozawa] mentioned in this 
> [comment|https://issues.apache.org/jira/browse/HADOOP-9613?focusedCommentId=14980024=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14980024]
>  that we can use {{getStatusInfo}}.
> [~ozawa], could you please help to confirm this point Or am I missing some 
> thing here. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5224) Logs for a completed container are not available in the yarn logs output for a live application

2016-06-20 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340841#comment-15340841
 ] 

Tsuyoshi Ozawa commented on YARN-5224:
--

Marking this as incompatible since the patch includes RESTful API's endpoint 
change

> Logs for a completed container are not available in the yarn logs output for 
> a live application
> ---
>
> Key: YARN-5224
> URL: https://issues.apache.org/jira/browse/YARN-5224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Siddharth Seth
>Assignee: Xuan Gong
>  Labels: incompatible
> Attachments: YARN-5224.1.patch, YARN-5224.2.patch, YARN-5224.3.patch, 
> YARN-5224.4.patch, YARN-5224.5.patch
>
>
> This affects 'short' jobs like MapReduce and Tez more than long running apps.
> Related: YARN-5193 (but that only covers long running apps)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-5224) Logs for a completed container are not available in the yarn logs output for a live application

2016-06-20 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-5224:
-
Labels: incompatible  (was: )

> Logs for a completed container are not available in the yarn logs output for 
> a live application
> ---
>
> Key: YARN-5224
> URL: https://issues.apache.org/jira/browse/YARN-5224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.9.0
>Reporter: Siddharth Seth
>Assignee: Xuan Gong
>  Labels: incompatible
> Attachments: YARN-5224.1.patch, YARN-5224.2.patch, YARN-5224.3.patch, 
> YARN-5224.4.patch, YARN-5224.5.patch
>
>
> This affects 'short' jobs like MapReduce and Tez more than long running apps.
> Related: YARN-5193 (but that only covers long running apps)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-5275) Timeline application page cannot be loaded when no application submitted/running on the cluster after HADOOP-9613

2016-06-20 Thread Tsuyoshi Ozawa (JIRA)

Tsuyoshi Ozawa created YARN-5275:


 Summary: Timeline application page cannot be loaded when no 
application submitted/running on the cluster after HADOOP-9613
 Key: YARN-5275
 URL: https://issues.apache.org/jira/browse/YARN-5275
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha1
Reporter: Tsuyoshi Ozawa
Priority: Critical


After HADOOP-9613, Timeline Web UI has a problem reported by [~leftnoteasy] and 
[~sunilg]

{quote}
when no application submitted/running on the cluster, applications page cannot 
be loaded. 
{quote}

We should investigate the reason and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5006) ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk

2016-05-13 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283261#comment-15283261
 ] 

Tsuyoshi Ozawa commented on YARN-5006:
--

{quote}
Our opinion about fixing this bug is that we want to add the limit of 
ApplicationStateData datasize at RMStateStore do StoreAppTransition .
{quote}

{quote}
You should also see if YARN-4958 would help resolve the issue. We're misusing 
ZK a bit as a data store, and YARN-4958 attempts to reduce the level of abuse. 
{quote}

Both of your opinions can be done in parallel and are worth fixing. Another 
workaround is to use compression.

> ResourceManager quit due to ApplicationStateData exceed the limit  size of 
> znode in zk
> --
>
> Key: YARN-5006
> URL: https://issues.apache.org/jira/browse/YARN-5006
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.2
>Reporter: dongtingting
>Priority: Critical
>
> Client submit a job, this job add 1 file into DistributedCache. when the 
> job is submitted, ResourceManager sotre ApplicationStateData into zk. 
> ApplicationStateData  is exceed the limit size of znode. RM exit 1.   
> The related code in RMStateStore.java :
> {code}
>   private static class StoreAppTransition
>   implements SingleArcTransition {
> @Override
> public void transition(RMStateStore store, RMStateStoreEvent event) {
>   if (!(event instanceof RMStateStoreAppEvent)) {
> // should never happen
> LOG.error("Illegal event type: " + event.getClass());
> return;
>   }
>   ApplicationState appState = ((RMStateStoreAppEvent) 
> event).getAppState();
>   ApplicationId appId = appState.getAppId();
>   ApplicationStateData appStateData = ApplicationStateData
>   .newInstance(appState);
>   LOG.info("Storing info for app: " + appId);
>   try {  
> store.storeApplicationStateInternal(appId, appStateData);  //store 
> the appStateData
> store.notifyApplication(new RMAppEvent(appId,
>RMAppEventType.APP_NEW_SAVED));
>   } catch (Exception e) {
> LOG.error("Error storing app: " + appId, e);
> store.notifyStoreOperationFailed(e);   //handle fail event, system 
> exit 
>   }
> };
>   }
> {code}
> The Exception log:
> {code}
>  ...
> 2016-04-20 11:26:35,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore 
> AsyncDispatcher event handler: Maxed out ZK retries. Giving up!
> 2016-04-20 11:26:35,732 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> AsyncDispatcher event handler: Error storing app: 
> application_1461061795989_17671
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
>

[jira] [Commented] (YARN-4994) Use MiniYARNCluster with try-with-resources in tests

2016-05-10 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278960#comment-15278960
 ] 

Tsuyoshi Ozawa commented on YARN-4994:
--

[~boky01] I'll check the patch.

> Use MiniYARNCluster with try-with-resources in tests
> 
>
> Key: YARN-4994
> URL: https://issues.apache.org/jira/browse/YARN-4994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.7.0
>Reporter: Andras Bokor
>Assignee: Andras Bokor
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: HDFS-10287.01.patch, HDFS-10287.02.patch, 
> HDFS-10287.03.patch, YARN-4994.04.patch, YARN-4994.05.patch, 
> YARN-4994.06.patch, YARN-4994.07.patch
>
>
> In tests MiniYARNCluster is used with the following pattern:
> In try-catch block create a MiniYARNCluster instance and in finally block 
> close it.
> [Try-with-resources|https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html]
>  is preferred since Java7 instead of the pattern above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5071) address HBase compatibility issues with trunk

2016-05-10 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278663#comment-15278663
 ] 

Tsuyoshi Ozawa commented on YARN-5071:
--

{quote}
 In principle, I don't think this is really a HBase problem at the moment as 
3.0.0 has not been released yet.
{quote}

I agree with you. I meant that we should focus on how Hadoop ecosystem, 
including HBase, can migrate branch-2 based code to trunk easily and smoothly. 
I think we should get feedback from users of Hadoop, including HBase 
developers, to avoid critical problems. In other words, this is a good time to 
recheck Hadoop Compatibility guide. 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Compatibility.html

> address HBase compatibility issues with trunk
> -
>
> Key: YARN-5071
> URL: https://issues.apache.org/jira/browse/YARN-5071
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>
> The trunk is now adding or planning to add more and more 
> backward-incompatible changes. Some examples include
> - remove v.1 metrics classes (HADOOP-12504)
> - update jersey version (HADOOP-9613)
> - target java 8 by default (HADOOP-11858)
> This poses big challenges for the timeline service v.2 as we have a 
> dependency on hbase which depends on an older version of hadoop.
> We need to find a way to solve/contain/manage these risks before it is too 
> late.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5071) address HBase compatibility issues with trunk

2016-05-10 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278626#comment-15278626
 ] 

Tsuyoshi Ozawa commented on YARN-5071:
--

I think we should have a discussion with HBase guys. 

[~stack] [~iwasakims] what do you think for HBase's supporting trunk code? Can 
we have a help or work to do so at Hadoop side? We'd like to know the barrier 
and problems of running hbase client under the trunk environment.

> address HBase compatibility issues with trunk
> -
>
> Key: YARN-5071
> URL: https://issues.apache.org/jira/browse/YARN-5071
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
>
> The trunk is now adding or planning to add more and more 
> backward-incompatible changes. Some examples include
> - remove v.1 metrics classes (HADOOP-12504)
> - update jersey version (HADOOP-9613)
> - target java 8 by default (HADOOP-11858)
> This poses big challenges for the timeline service v.2 as we have a 
> dependency on hbase which depends on an older version of hadoop.
> We need to find a way to solve/contain/manage these risks before it is too 
> late.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4844) Add getMemoryLong/getVirtualCoreLong to o.a.h.y.api.records.Resource

2016-05-10 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278514#comment-15278514
 ] 

Tsuyoshi Ozawa commented on YARN-4844:
--

[~wangda], it's reasonable to make these values long. Should we make 
getVirtualCores which return int value deprecated? What do you think?

> Add getMemoryLong/getVirtualCoreLong to o.a.h.y.api.records.Resource
> 
>
> Key: YARN-4844
> URL: https://issues.apache.org/jira/browse/YARN-4844
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Attachments: YARN-4844.1.patch, YARN-4844.2.patch, YARN-4844.3.patch, 
> YARN-4844.4.patch, YARN-4844.5.patch, YARN-4844.6.patch, YARN-4844.7.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
> memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending 
> resources of running apps to cluster's total pending resources. If a 
> problematic app requires too much resources (let's say 1M+ containers, each 
> of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that 
> there're many running apps, each of them has capped but still significant 
> numbers of pending resources.
> So we may possibly need to add getMemoryLong/getVirtualCoreLong to 
> o.a.h.y.api.records.Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort

2016-02-28 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171471#comment-15171471
 ] 

Tsuyoshi Ozawa commented on YARN-4743:
--

[~gzh1992n] thank you for the report. IIUC, the comparator must ensure that the 
relation be transitive. 

https://docs.oracle.com/javase/7/docs/api/java/lang/Comparable.html

I think that DRF comparator is not transitive with my intuition. [~kasha], what 
do you think? Can we design the comparator as transitive comparator?

> ResourceManager crash because TimSort
> -
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.4
>Reporter: Zephyr Guo
>
> {code}
> 2016-02-26 14:08:50,821 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general 
> contract!
>  at java.util.TimSort.mergeHi(TimSort.java:868)
>  at java.util.TimSort.mergeAt(TimSort.java:485)
>  at java.util.TimSort.mergeCollapse(TimSort.java:410)
>  at java.util.TimSort.sort(TimSort.java:214)
>  at java.util.TimSort.sort(TimSort.java:173)
>  at java.util.Arrays.sort(Arrays.java:659)
>  at java.util.Collections.sort(Collections.java:217)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>  at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting 
> {{runnableApps}}.
> {code:title=FSLeafQueue.java}
> Comparator comparator = policy.getComparator();
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ..
>   s1.getResourceUsage(), minShare1);
>   boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
>   s2.getResourceUsage(), minShare2);
>   minShareRatio1 = (double) s1.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare1, 
> ONE).getMemory();
>   minShareRatio2 = (double) s2.getResourceUsage().getMemory()
>   / Resources.max(RESOURCE_CALCULATOR, null, minShare2, 
> ONE).getMemory();
> ..
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is 
> unstable. 
> {code:title=FSAppAttempt.java}
> @Override
>   public Resource getResourceUsage() {
> // Here the getPreemptedResources() always return zero, except in
> // a preemption round
> return Resources.subtract(getCurrentConsumption(), 
> getPreemptedResources());
>   }
> {code}
> {code:title=SchedulerApplicationAttempt}
>  public Resource getCurrentConsumption() {
> return currentConsumption;
>   }
> // This method may modify current Resource.
> public synchronized void recoverContainer(RMContainer rmContainer) {
> ..
> Resources.addTo(currentConsumption, rmContainer.getContainer()
>   .getResource());
> ..
>   }
> {code}
> I suggest that use stable Resource in comparator.
> Is there something i think wrong？



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4673) race condition in ResourceTrackerService#nodeHeartBeat while processing deduplicated msg

2016-02-24 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166867#comment-15166867
 ] 

Tsuyoshi Ozawa commented on YARN-4673:
--

Hi [~sandflee] thank you for the contribution. Could you explain the cause of 
the deadlock? It helps us to review your patch more fast and more correctly.

> race condition in ResourceTrackerService#nodeHeartBeat while processing 
> deduplicated msg
> 
>
> Key: YARN-4673
> URL: https://issues.apache.org/jira/browse/YARN-4673
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: sandflee
>Assignee: sandflee
> Attachments: YARN-4673.01.patch
>
>
> we could add a lock like ApplicationMasterService#allocate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)

2016-02-24 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163046#comment-15163046
 ] 

Tsuyoshi Ozawa commented on YARN-4630:
--

Hey Akira, can I check this since it seems to include the changes 
againstContainerId? It has an impact against RM-HA.

> Remove useless boxing/unboxing code (Hadoop YARN)
> -
>
> Key: YARN-4630
> URL: https://issues.apache.org/jira/browse/YARN-4630
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Priority: Minor
> Attachments: YARN-4630.0.patch
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-4630) Remove useless boxing/unboxing code (Hadoop YARN)

2016-02-24 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163046#comment-15163046
 ] 

Tsuyoshi Ozawa edited comment on YARN-4630 at 2/24/16 2:15 PM:
---

Hey Akira, can I check this since it seems to include the changes against 
ContainerId? It has an impact against RM-HA.


was (Author: ozawa):
Hey Akira, can I check this since it seems to include the changes 
againstContainerId? It has an impact against RM-HA.

> Remove useless boxing/unboxing code (Hadoop YARN)
> -
>
> Key: YARN-4630
> URL: https://issues.apache.org/jira/browse/YARN-4630
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Priority: Minor
> Attachments: YARN-4630.0.patch
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption

2016-02-23 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158717#comment-15158717
 ] 

Tsuyoshi Ozawa commented on YARN-4648:
--

Note: The failures of TestClientRMTokens and TestAMAuthorization are tracked on 
HADOOP-12687. It's not related to the patch uploaded here.

> Move preemption related tests from TestFairScheduler to 
> TestFairSchedulerPreemption
> ---
>
> Key: YARN-4648
> URL: https://issues.apache.org/jira/browse/YARN-4648
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie++
> Attachments: YARN-4648.01.patch, YARN-4648.02.patch, 
> YARN-4648.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption

2016-02-23 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158713#comment-15158713
 ] 

Tsuyoshi Ozawa commented on YARN-4648:
--

+1, checking this in.

> Move preemption related tests from TestFairScheduler to 
> TestFairSchedulerPreemption
> ---
>
> Key: YARN-4648
> URL: https://issues.apache.org/jira/browse/YARN-4648
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie++
> Attachments: YARN-4648.01.patch, YARN-4648.02.patch, 
> YARN-4648.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4713) Warning by unchecked conversion in TestTimelineWebServices

2016-02-21 Thread Tsuyoshi Ozawa (JIRA)

Tsuyoshi Ozawa created YARN-4713:


 Summary: Warning by unchecked conversion in 
TestTimelineWebServices 
 Key: YARN-4713
 URL: https://issues.apache.org/jira/browse/YARN-4713
 Project: Hadoop YARN
  Issue Type: Test
  Components: test
Reporter: Tsuyoshi Ozawa


[WARNING] 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java:[123,38]
 [unchecked] unchecked conversion

{code}
  Enumeration names = mock(Enumeration.class);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4708) Missing default mapper type in TimelineServer performance test tool usage

2016-02-21 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156267#comment-15156267
 ] 

Tsuyoshi Ozawa commented on YARN-4708:
--

+1, checking this in.

> Missing default mapper type in TimelineServer performance test tool usage
> -
>
> Key: YARN-4708
> URL: https://issues.apache.org/jira/browse/YARN-4708
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Kai Sasaki
>Assignee: Kai Sasaki
>Priority: Minor
> Attachments: YARN-4708.01.patch
>
>
> TimelineServer performance test tool uses SimpleEntityWriter as default 
> mapper. It can be indicated explicitly in usage of the tool.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption

2016-02-20 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155844#comment-15155844
 ] 

Tsuyoshi Ozawa commented on YARN-4648:
--

[~lewuathe] Thank you for updating. Unfortunatelly, 
startResourceManagerForPreemptionTest is still confusing because the name of 
the class is TestFairSchedulerPreemption. 

My suggestion is to rename startResourceManager to 
startResourceManagerWithStubbedFairScheduler, and 
startResourceManagerForPreemptionTest to 
startResourceManagerWithRealFairScheduler. Do you have any better idea?

> Move preemption related tests from TestFairScheduler to 
> TestFairSchedulerPreemption
> ---
>
> Key: YARN-4648
> URL: https://issues.apache.org/jira/browse/YARN-4648
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie++
> Attachments: YARN-4648.01.patch, YARN-4648.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default

2016-02-19 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155388#comment-15155388
 ] 

Tsuyoshi Ozawa commented on YARN-2225:
--

It's a bit too aggressive to disable vmem-check as discussed on this issue 
since some user enables vmem-check.

IMHO, I prefer to make the default value of the vmem ratio larger. How about 
closing this issue and doing it on another jira(or, moving HADOOP-11364 to YARN 
issue) since the addressing problem is different from this issue?

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default

2016-02-16 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148329#comment-15148329
 ] 

Tsuyoshi Ozawa commented on YARN-2225:
--

Why not making vmem-pmem ratio larger to address the problem?

> Turn the virtual memory check to be off by default
> --
>
> Key: YARN-2225
> URL: https://issues.apache.org/jira/browse/YARN-2225
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2225.patch
>
>
> The virtual memory check may not be the best way to isolate applications. 
> Virtual memory is not the constrained resource. It would be better if we 
> limit the swapping of the task using swapiness instead. This patch will turn 
> this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
> they need to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption

2016-02-15 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147930#comment-15147930
 ] 

Tsuyoshi Ozawa commented on YARN-4648:
--

[~kaisasak] Instead of changing the sequence of initialization, how about 
changing the name of {{startResourceManagerWithoutThreshold}}? I think the name 
of {{startResourceManagerWithoutThreshold}} looks confusing since 
the behaviour of the method named {{startResourceManagerWithoutThreshold()}} 
looks to be equals to startResourceManager(1.1f). What do you think?

> Move preemption related tests from TestFairScheduler to 
> TestFairSchedulerPreemption
> ---
>
> Key: YARN-4648
> URL: https://issues.apache.org/jira/browse/YARN-4648
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie++
> Attachments: YARN-4648.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption

2016-02-14 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146932#comment-15146932
 ] 

Tsuyoshi Ozawa commented on YARN-4648:
--

[~lewuathe] thank you for your contribution. I looked over your patch. I have 
some comments, so could you address them? 

{code}
  private void startResourceManagerWithoutThreshold() {
{code}
Why not reusing startResourceManager(threshold) with the threshold larger than 
1.0f?

{code}
+import org.apache.hadoop.yarn.api.records.*;
{code}

Please don't use * import.

> Move preemption related tests from TestFairScheduler to 
> TestFairSchedulerPreemption
> ---
>
> Key: YARN-4648
> URL: https://issues.apache.org/jira/browse/YARN-4648
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie++
> Attachments: YARN-4648.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4648) Move preemption related tests from TestFairScheduler to TestFairSchedulerPreemption

2016-02-11 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143676#comment-15143676
 ] 

Tsuyoshi Ozawa commented on YARN-4648:
--

[~lewuathe], sure, I'll check this on this weekend.

> Move preemption related tests from TestFairScheduler to 
> TestFairSchedulerPreemption
> ---
>
> Key: YARN-4648
> URL: https://issues.apache.org/jira/browse/YARN-4648
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Kai Sasaki
>  Labels: newbie++
> Attachments: YARN-4648.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-24 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070805#comment-15070805
 ] 

Tsuyoshi Ozawa commented on YARN-4234:
--

[~djp] [~iwasakims] committed addendum patch by Masatake to trunk and 
branch-2(Just removing a file "q" in root directory). Thanks for your 
contribution!

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, 
> YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch, 
> YARN-4234.addendum.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-24 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070799#comment-15070799
 ] 

Tsuyoshi Ozawa commented on YARN-4234:
--

[~iwasakims] thanks for following up. +1, checking this in.


> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, 
> YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch, 
> YARN-4234.addendum.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-24 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070915#comment-15070915
 ] 

Tsuyoshi Ozawa commented on YARN-4234:
--

sorry, I just forgot to push. thanks for following up!

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, 
> YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch, 
> YARN-4234.addendum.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4505) TestJobHistoryEventHandler.testTimelineEventHandling fails on trunk because of NPE

2015-12-24 Thread Tsuyoshi Ozawa (JIRA)

Tsuyoshi Ozawa created YARN-4505:


 Summary: TestJobHistoryEventHandler.testTimelineEventHandling 
fails on trunk because of NPE
 Key: YARN-4505
 URL: https://issues.apache.org/jira/browse/YARN-4505
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi Ozawa


https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/824/

{code}
Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.163 sec <<< 
FAILURE! - in org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
  Time elapsed: 5.115 sec  <<< ERROR!
java.lang.NullPointerException: null
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:331)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1015)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:586)
at 
org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.handleEvent(TestJobHistoryEventHandler.java:722)
at 
org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:510)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4301) NM disk health checker should have a timeout

2015-12-09 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048245#comment-15048245
 ] 

Tsuyoshi Ozawa commented on YARN-4301:
--

{quote}
it maybe change the behaviour of NM_MIN_HEALTHY_DISKS_FRACTION, could we add a 
timeout to mkdir? if mkdir timeout, the disk is treated as a failed disk.
{quote}

+1 for the suggestion by [~sandflee]. 

> NM disk health checker should have a timeout
> 
>
> Key: YARN-4301
> URL: https://issues.apache.org/jira/browse/YARN-4301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akihiro Suda
>Assignee: Akihiro Suda
> Attachments: YARN-4301-1.patch, YARN-4301-2.patch, 
> concept-async-diskchecker.txt
>
>
> The disk health checker [verifies a 
> disk|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java#L371-L385]
>  by executing {{mkdir}} and {{rmdir}} periodically.
> If these operations does not return in a moderate timeout, the disk should be 
> marked bad, and thus {{nodeInfo.nodeHealthy}} should flip to {{false}}.
> I confirmed that current YARN does not have an implicit timeout (on JDK7, 
> Linux 4.2, ext4) using [Earthquake|https://github.com/osrg/earthquake], our 
> fault injector for distributed systems.
> (I'll introduce the reproduction script in a while)
> I consider we can fix this issue by making 
> [{{NodeHealthCheckerServer.isHealthy()}}|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java#L69-L73]
>  return {{false}} if the value of {{this.getLastHealthReportTime()}} is too 
> old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4438) Implement RM leader election with curator

2015-12-09 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4438:
-
Issue Type: Improvement  (was: Bug)

> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4439) Clarify NMContainerStatus#toString method.

2015-12-09 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049809#comment-15049809
 ] 

Tsuyoshi Ozawa commented on YARN-4439:
--

[~jianhe] should we also add Priority to the printing string?

> Clarify NMContainerStatus#toString method.
> --
>
> Key: YARN-4439
> URL: https://issues.apache.org/jira/browse/YARN-4439
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4439.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046951#comment-15046951
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Now I committed this to branch-2.6.3 too. Thanks!

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046950#comment-15046950
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

[~djp] I committed this to branch-2.6, which is targeting 2.6.3. Can I push 
this to branch-2.6.3?

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046944#comment-15046944
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Ran tests locally and pass tests on branch-2.6. Committing this to branch-2.6.

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Fix For: 2.6.3, 2.7.3
>
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4301) NM disk health checker should have a timeout

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4301:
-
Assignee: Akihiro Suda

> NM disk health checker should have a timeout
> 
>
> Key: YARN-4301
> URL: https://issues.apache.org/jira/browse/YARN-4301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akihiro Suda
>Assignee: Akihiro Suda
> Attachments: YARN-4301-1.patch, YARN-4301-2.patch, 
> concept-async-diskchecker.txt
>
>
> The disk health checker [verifies a 
> disk|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java#L371-L385]
>  by executing {{mkdir}} and {{rmdir}} periodically.
> If these operations does not return in a moderate timeout, the disk should be 
> marked bad, and thus {{nodeInfo.nodeHealthy}} should flip to {{false}}.
> I confirmed that current YARN does not have an implicit timeout (on JDK7, 
> Linux 4.2, ext4) using [Earthquake|https://github.com/osrg/earthquake], our 
> fault injector for distributed systems.
> (I'll introduce the reproduction script in a while)
> I consider we can fix this issue by making 
> [{{NodeHealthCheckerServer.isHealthy()}}|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java#L69-L73]
>  return {{false}} if the value of {{this.getLastHealthReportTime()}} is too 
> old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4301) NM disk health checker should have a timeout

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048221#comment-15048221
 ] 

Tsuyoshi Ozawa commented on YARN-4301:
--

[~suda] thank you for the point. I have some comments about v2 patch - could 
you update them?

1. About the synchronization of DirectoryCollection,  I got the point you 
mentioned. The change, however, causes race condition between states in the 
class(localDirs, fullDirs, errorDirs, and numFailures) - e.g. 
{{DirectoryCollection.concat(errorDirs, fullDirs))}}, {{createNonExistentDirs}} 
and other functions cannot work well without synchronization. 

I think the root cause of the problem is to calling {{DC.testDirs}} with 
synchronization in {{DC.checkDirs}}. How about releasing lock before calling 
{{testDirs}} and acquiring lock after calling {{testDirs}}?

{quote}
synchronized DC.getFailedDirs() can be blocked by synchronized DC.checkDirs(), 
when File.mkdir() (called from DC.checkDirs(), via DC.testDirs()) does not 
return in a moderate timeout.
Hence NodeHealthCheckerServer.isHealthy() gets also blocked.
So I would like to make DC.getXXXs unsynchronized.
{quote}

2. If the thread is preempted by OS and moves to another CPU in multicore 
environment, gap can be negative value. Hence I prefer not to abort NodeManager 
here.
{code:title=NodeHealthCheckerService.java}
+long diskCheckTime = dirsHandler.getLastDisksCheckTime();
+long now = System.currentTimeMillis();
+long gap = now - diskCheckTime;
+if (gap < 0) {
+  throw new AssertionError("implementation error - now=" + now
+  + ", diskCheckTime=" + diskCheckTime);
+}
{code}

3. Please move validations of configuration to serviceInit to avoid aborting at 
runtime.
{code:title=NodeHealthCheckerService.java}
+long allowedGap = this.diskHealthCheckInterval + 
this.diskHealthCheckTimeout;
+if (allowedGap <= 0) {
+  throw new AssertionError("implementation error - interval=" + 
this.diskHealthCheckInterval
+  + ", timeout=" + this.diskHealthCheckTimeout);
+}
{code}


> NM disk health checker should have a timeout
> 
>
> Key: YARN-4301
> URL: https://issues.apache.org/jira/browse/YARN-4301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akihiro Suda
> Attachments: YARN-4301-1.patch, YARN-4301-2.patch, 
> concept-async-diskchecker.txt
>
>
> The disk health checker [verifies a 
> disk|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java#L371-L385]
>  by executing {{mkdir}} and {{rmdir}} periodically.
> If these operations does not return in a moderate timeout, the disk should be 
> marked bad, and thus {{nodeInfo.nodeHealthy}} should flip to {{false}}.
> I confirmed that current YARN does not have an implicit timeout (on JDK7, 
> Linux 4.2, ext4) using [Earthquake|https://github.com/osrg/earthquake], our 
> fault injector for distributed systems.
> (I'll introduce the reproduction script in a while)
> I consider we can fix this issue by making 
> [{{NodeHealthCheckerServer.isHealthy()}}|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java#L69-L73]
>  return {{false}} if the value of {{this.getLastHealthReportTime()}} is too 
> old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4301) NM disk health checker should have a timeout

2015-12-08 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047902#comment-15047902
 ] 

Tsuyoshi Ozawa commented on YARN-4301:
--

[~suda] thank you for updating. The warning by findbugs looks related to the 
change. Could you fix it?

> NM disk health checker should have a timeout
> 
>
> Key: YARN-4301
> URL: https://issues.apache.org/jira/browse/YARN-4301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akihiro Suda
> Attachments: YARN-4301-1.patch, YARN-4301-2.patch, 
> concept-async-diskchecker.txt
>
>
> The disk health checker [verifies a 
> disk|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DirectoryCollection.java#L371-L385]
>  by executing {{mkdir}} and {{rmdir}} periodically.
> If these operations does not return in a moderate timeout, the disk should be 
> marked bad, and thus {{nodeInfo.nodeHealthy}} should flip to {{false}}.
> I confirmed that current YARN does not have an implicit timeout (on JDK7, 
> Linux 4.2, ext4) using [Earthquake|https://github.com/osrg/earthquake], our 
> fault injector for distributed systems.
> (I'll introduce the reproduction script in a while)
> I consider we can fix this issue by making 
> [{{NodeHealthCheckerServer.isHealthy()}}|https://github.com/apache/hadoop/blob/96677bef00b03057038157efeb3c2ad4702914da/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeHealthCheckerService.java#L69-L73]
>  return {{false}} if the value of {{this.getLastHealthReportTime()}} is too 
> old.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-07 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046332#comment-15046332
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Committed this to branch-2.7. Thanks [~jianhe] for reviewing and reporting!

I will cherrypick this to branch-2.6 after running tests.

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-06 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044258#comment-15044258
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

[~jianhe] could you take a look?

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding ZK's callback work correctly

2015-12-01 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Summary: ZKRMStateStore.syncInternal shouldn't wait for sync completion for 
avoiding ZK's callback work correctly  (was: ZKRMStateStore.syncInternal should 
wait for zkResyncWaitTime instead of zkSessionTimeout)

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> ZK's callback work correctly
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-01 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034071#comment-15034071
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

[~zxu] [~jianhe]
I'm rethinking of [this 
comment|https://issues.apache.org/jira/browse/YARN-3798?focusedCommentId=14609769=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609769]
 about sync callback to wait for sync completion: this can cause [the lock 
problem described 
here|https://issues.apache.org/jira/browse/YARN-4348?focusedCommentId=15018159=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15018159].
 

To deal with problem easily, we can just remove a barrier by the sync callback. 
This works well because ZK client's requests are sent to ZK server in order, 
unless ZK master server fails while recreating ZK connection. Quorum sync, 
ZOOKEEPER-2136, is good helper to deal with the corner case.

What do you think?

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-12-01 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034045#comment-15034045
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

[~jianhe] 

{quote}
If curator sync up the data it would be fine. Otherwise there could be a chance 
of lag like we discussed earlier. Truly I haven't tried Curator yet, probably 
some one can cross check this part.
{quote}

FYI, when Curator detects the same situation, it call sync automatically in 
{{doSyncForSuspendedConnection}} method in Curator Framework. Therefore, we 
don't need to call sync operation on trunk and branch-2.8 code.

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
> Fix For: 2.7.2, 2.6.2
>
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.6.02.patch, 
> YARN-3798-branch-2.7.002.patch, YARN-3798-branch-2.7.003.patch, 
> YARN-3798-branch-2.7.004.patch, YARN-3798-branch-2.7.005.patch, 
> YARN-3798-branch-2.7.006.patch, YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
>

[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-01 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Summary: ZKRMStateStore.syncInternal shouldn't wait for sync completion for 
avoiding blocking ZK's event thread  (was: ZKRMStateStore.syncInternal 
shouldn't wait for sync completion for avoiding ZK's callback work correctly)

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-12-01 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033915#comment-15033915
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Ran test since last Jenkins failed to launch. javadoc seems to be false 
positive since this patch doesn't include any changes of javadocs.

{quote}
-1 overall.  

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated 2079 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version ) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.
{quote}

[~jianhe] could you take a look at latest patch?

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-12-01 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033956#comment-15033956
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

[~djp] IMHO, this is a blocker ticket of 2.6.3 and 2.7.3 since the problem is 
more serious than I've thought. please check [this 
comment|https://issues.apache.org/jira/browse/YARN-4348?focusedCommentId=15018159=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15018159].
 This is a unexpected behaviour when RM fails over and it prevents RM fail over 
correctly.

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-30 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Attachment: YARN-4348-branch-2.7.004.patch

Adding missing {{continue}} statement after calling {{syncInternal}} in the 
following block:

{code}
  if (shouldRetryWithNewConnection(ke.code()) && retry < numRetries) {
LOG.info("Retrying operation on ZK with new Connection. " +
"Retry no. " + retry);
Thread.sleep(zkRetryInterval);
createConnection();
syncInternal(ke.getPath());
continue;
  }
{code}

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-30 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032845#comment-15032845
 ] 

Tsuyoshi Ozawa edited comment on YARN-4348 at 12/1/15 3:18 AM:
---

[~jianhe] good catch. Adding missing {{continue}} statement after calling 
{{syncInternal}} in the following block in v4 patch. 


was (Author: ozawa):
Adding missing {{continue}} statement after calling {{syncInternal}} in the 
following block:

{code}
  if (shouldRetryWithNewConnection(ke.code()) && retry < numRetries) {
LOG.info("Retrying operation on ZK with new Connection. " +
"Retry no. " + retry);
Thread.sleep(zkRetryInterval);
createConnection();
syncInternal(ke.getPath());
continue;
  }
{code}

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-30 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033002#comment-15033002
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Jenkins still fail. Opened YETUS-217 to track the problem. Kicking Jenkins on 
local.

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-30 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033002#comment-15033002
 ] 

Tsuyoshi Ozawa edited comment on YARN-4348 at 12/1/15 3:22 AM:
---

Jenkins still fail. Opened YETUS-217 to track the problem. Kicking 
test-patch.sh on local.


was (Author: ozawa):
Jenkins still fail. Opened YETUS-217 to track the problem. Kicking Jenkins on 
local.

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-30 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032845#comment-15032845
 ] 

Tsuyoshi Ozawa edited comment on YARN-4348 at 12/1/15 5:44 AM:
---

[~jianhe] good catch. Adding missing {{continue}} statement after calling 
{{syncInternal}} in v4 patch. 


was (Author: ozawa):
[~jianhe] good catch. Adding missing {{continue}} statement after calling 
{{syncInternal}} in the following block in v4 patch. 

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-26 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028989#comment-15028989
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Ran test since last Jenkins failed to launch. 
{quote}
-1 overall.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated 2079 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version ) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.
{quote}


> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, 
> log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027096#comment-15027096
 ] 

Tsuyoshi Ozawa commented on YARN-4393:
--

+1, checking this in.

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026936#comment-15026936
 ] 

Tsuyoshi Ozawa commented on YARN-4318:
--

[~kshukla] please go ahead :-)

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Kuhu Shukla
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa reopened YARN-4393:
--

oops, commentted on wrong jira. Reopening.

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4380:
-
Hadoop Flags: Reviewed

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027113#comment-15027113
 ] 

Tsuyoshi Ozawa commented on YARN-4393:
--

[~varun_saxena], before committing, I found that there are some missing 
dispatcher.await():

testResourceRelease:
{code}
  //Send Cleanup Event
  spyService.handle(new ContainerLocalizationCleanupEvent(c, req)); // <-- 
here!
  verify(mockLocallilzerTracker)
.cleanupPrivLocalizers("container_314159265358979_0003_01_42");
  req2.remove(LocalResourceVisibility.PRIVATE);
  spyService.handle(new ContainerLocalizationCleanupEvent(c, req2));
  dispatcher.await();
{code}

testFailedDirsResourceRelease:
{code}
  // Send Cleanup Event
  spyService.handle(new ContainerLocalizationCleanupEvent(c, req)); // <- 
here!
  verify(mockLocallilzerTracker).cleanupPrivLocalizers(
"container_314159265358979_0003_01_42");
{code}

testRecovery:
{code}
  assertNotNull("Localization not started", privLr1.getLocalPath());
  privTracker1.handle(new ResourceLocalizedEvent(privReq1,
  privLr1.getLocalPath(), privLr1.getSize() + 5));
  assertNotNull("Localization not started", privLr2.getLocalPath());
  privTracker1.handle(new ResourceLocalizedEvent(privReq2,
  privLr2.getLocalPath(), privLr2.getSize() + 10));
  assertNotNull("Localization not started", appLr1.getLocalPath());
  appTracker1.handle(new ResourceLocalizedEvent(appReq1,
  appLr1.getLocalPath(), appLr1.getSize()));
  assertNotNull("Localization not started", appLr3.getLocalPath());
  appTracker2.handle(new ResourceLocalizedEvent(appReq3,
  appLr3.getLocalPath(), appLr3.getSize() + 7));
  assertNotNull("Localization not started", pubLr1.getLocalPath());
  pubTracker.handle(new ResourceLocalizedEvent(pubReq1,
  pubLr1.getLocalPath(), pubLr1.getSize() + 1000));
  assertNotNull("Localization not started", pubLr2.getLocalPath());
  pubTracker.handle(new ResourceLocalizedEvent(pubReq2,
  pubLr2.getLocalPath(), pubLr2.getSize() + 9));
{code}

Could you update them?

> TestResourceLocalizationService#testFailedDirsResourceRelease fails 
> intermittently
> --
>
> Key: YARN-4393
> URL: https://issues.apache.org/jira/browse/YARN-4393
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: 2.7.3
>
> Attachments: YARN-4393.01.patch
>
>
> [~ozawa] pointed out this failure on YARN-4380.
> Check 
> https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773
> {noformat}
> sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>  Time elapsed: 0.093 sec <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> eventHandler.handle(
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> Actual invocation has different arguments:
> eventHandler.handle(
> EventType: APPLICATION_INITED
> );
> -> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027145#comment-15027145
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Kicking Jenkins again.

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, 
> log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4318) Test failure: TestAMAuthorization

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4318:
-
Assignee: Kuhu Shukla

> Test failure: TestAMAuthorization
> -
>
> Key: YARN-4318
> URL: https://issues.apache.org/jira/browse/YARN-4318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>Assignee: Kuhu Shukla
>
> {quote}
> Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization
> testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
>   Time elapsed: 3.208 sec  <<< ERROR!
> java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
> destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For 
> more details see:  http://wiki.apache.org/hadoop/UnknownHost
>   at org.apache.hadoop.ipc.Client$Connection.(Client.java:403)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>   at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently

2015-11-25 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027004#comment-15027004
 ] 

Tsuyoshi Ozawa commented on YARN-4380:
--

+1, checking this in.

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently
> 
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4387) Fix FairScheduler log message

2015-11-24 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023984#comment-15023984
 ] 

Tsuyoshi Ozawa commented on YARN-4387:
--

+1, checking this in.

> Fix FairScheduler log message
> -
>
> Key: YARN-4387
> URL: https://issues.apache.org/jira/browse/YARN-4387
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Xin Wang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4387) Fix typo in FairScheduler log message

2015-11-24 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4387:
-
Summary: Fix typo in FairScheduler log message  (was: Fix FairScheduler log 
message)

> Fix typo in FairScheduler log message
> -
>
> Key: YARN-4387
> URL: https://issues.apache.org/jira/browse/YARN-4387
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Xin Wang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4387) Fix typo in FairScheduler log message

2015-11-24 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4387:
-
Attachment: YARN-4387.001.patch

> Fix typo in FairScheduler log message
> -
>
> Key: YARN-4387
> URL: https://issues.apache.org/jira/browse/YARN-4387
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Xin Wang
>Priority: Minor
> Attachments: YARN-4387.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4387) Fix typo in FairScheduler log message

2015-11-24 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4387:
-
Hadoop Flags: Reviewed

> Fix typo in FairScheduler log message
> -
>
> Key: YARN-4387
> URL: https://issues.apache.org/jira/browse/YARN-4387
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Xin Wang
>Priority: Minor
> Attachments: YARN-4387.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids

2015-11-24 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15024028#comment-15024028
 ] 

Tsuyoshi Ozawa commented on YARN-4371:
--

[~sunilg] thank you for the initial patch. I looked over the patch and have a 
comment about the design. 

In the patch, a new RPC, {{killApplication(List 
applicationIds)}}, is added. IMHO, it's better to call multiple 
{{killApplication(ApplicationId applicationId)}} since it's simpler and I think 
killApplication is not called too much.  Could you update so?




> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
> Attachments: 0001-YARN-4371.patch
>
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4387) Fix typo in FairScheduler log message

2015-11-24 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4387:
-
Target Version/s: 2.8.0

> Fix typo in FairScheduler log message
> -
>
> Key: YARN-4387
> URL: https://issues.apache.org/jira/browse/YARN-4387
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Xin Wang
>Priority: Minor
> Attachments: YARN-4387.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4387) Fix typo in FairScheduler log message

2015-11-24 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4387:
-
Assignee: Xin Wang

> Fix typo in FairScheduler log message
> -
>
> Key: YARN-4387
> URL: https://issues.apache.org/jira/browse/YARN-4387
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Xin Wang
>Assignee: Xin Wang
>Priority: Minor
> Attachments: YARN-4387.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4306) Test failure: TestClientRMTokens

2015-11-24 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15024298#comment-15024298
 ] 

Tsuyoshi Ozawa commented on YARN-4306:
--

This problem still continues on trunk - [~sunilg], could you take a look at 
this problem? 

> Test failure: TestClientRMTokens
> 
>
> Key: YARN-4306
> URL: https://issues.apache.org/jira/browse/YARN-4306
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Sunil G
>Assignee: Sunil G
>
> Tests are getting failed in local also. As part of HADOOP-12321 jenkins run, 
> I see same error.:
> {noformat}testShortCircuitRenewCancelDifferentHostSamePort(org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens)
>   Time elapsed: 0.638 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.checkShortCircuitRenewCancel(TestClientRMTokens.java:363)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens.testShortCircuitRenewCancelDifferentHostSamePort(TestClientRMTokens.java:316)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022660#comment-15022660
 ] 

Tsuyoshi Ozawa edited comment on YARN-4385 at 11/23/15 6:25 PM:


>From https://builds.apache.org/job/Hadoop-Yarn-trunk/1380/

{quote}


  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShellWithNodeLabels.setup:47 » YarnRuntime 
java.io.IOException:...

Tests run: 14, Failures: 0, Errors: 12, Skipped: 0

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop YARN  SUCCESS [  4.803 s]
[INFO] Apache Hadoop YARN API  SUCCESS [04:44 min]
[INFO] Apache Hadoop YARN Common . SUCCESS [03:31 min]
[INFO] Apache Hadoop YARN Server . SUCCESS [  0.109 s]
[INFO] Apache Hadoop YARN Server Common .. SUCCESS [ 57.348 s]
[INFO] Apache Hadoop YARN NodeManager  SUCCESS [10:05 min]
[INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [ 29.458 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [03:46 min]
[INFO] Apache Hadoop YARN ResourceManager  SUCCESS [  01:03 h]
[INFO] Apache Hadoop YARN Server Tests ... SUCCESS [01:52 min]
[INFO] Apache Hadoop YARN Client . SUCCESS [07:21 min]
[INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [ 32.136 s]
[INFO] Apache Hadoop YARN Applications ... SUCCESS [  0.053 s]
[INFO] Apache Hadoop YARN DistributedShell ... FAILURE [ 29.403 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SKIPPED
[INFO] Apache Hadoop YARN Site ... SKIPPED
[INFO] Apache Hadoop YARN Registry ... SKIPPED
[INFO] Apache Hadoop YARN Project  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:37 h
[INFO] Finished at: 2015-11-09T20:36:25+00:00
[INFO] Final Memory: 81M/690M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-yarn-applications-distributedshell: There are test failures.
[ERROR]
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/surefire-reports
 for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-yarn-applications-distributedshell
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Updating HDFS-9234
Sending e-mails to: yarn-...@hadoop.apache.org
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
12 tests failed.
FAILED:  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithInvalidArgs

Error Message:
java.io.IOException: ResourceManager failed to start. Final state is STOPPED

Stack Trace:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
ResourceManager failed to start. Final state is STOPPED
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:331)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.access$500(MiniYARNCluster.java:99)
at

[jira] [Comment Edited] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022660#comment-15022660
 ] 

Tsuyoshi Ozawa edited comment on YARN-4385 at 11/23/15 6:26 PM:


On my local log: 

{quote}
Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 437.156 sec 
<<< FAILURE! - in 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
testDSShellWithCustomLogPropertyFile(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 115.558 sec  <<< ERROR!
java.lang.Exception: test timed out after 9 milliseconds
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:734)
at 
org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:715)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithCustomLogPropertyFile(TestDistributedShell.java:502)
{quote}


was (Author: ozawa):
>From https://builds.apache.org/job/Hadoop-Yarn-trunk/1380/

{quote}


  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShellWithNodeLabels.setup:47 » YarnRuntime 
java.io.IOException:...

Tests run: 14, Failures: 0, Errors: 12, Skipped: 0

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop YARN  SUCCESS [  4.803 s]
[INFO] Apache Hadoop YARN API  SUCCESS [04:44 min]
[INFO] Apache Hadoop YARN Common . SUCCESS [03:31 min]
[INFO] Apache Hadoop YARN Server . SUCCESS [  0.109 s]
[INFO] Apache Hadoop YARN Server Common .. SUCCESS [ 57.348 s]
[INFO] Apache Hadoop YARN NodeManager  SUCCESS [10:05 min]
[INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [ 29.458 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [03:46 min]
[INFO] Apache Hadoop YARN ResourceManager  SUCCESS [  01:03 h]
[INFO] Apache Hadoop YARN Server Tests ... SUCCESS [01:52 min]
[INFO] Apache Hadoop YARN Client . SUCCESS [07:21 min]
[INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [ 32.136 s]
[INFO] Apache Hadoop YARN Applications ... SUCCESS [  0.053 s]
[INFO] Apache Hadoop YARN DistributedShell ... FAILURE [ 29.403 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SKIPPED
[INFO] Apache Hadoop YARN Site ... SKIPPED
[INFO] Apache Hadoop YARN Registry ... SKIPPED
[INFO] Apache Hadoop YARN Project  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:37 h
[INFO] Finished at: 2015-11-09T20:36:25+00:00
[INFO] Final Memory: 81M/690M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-yarn-applications-distributedshell: There are test failures.
[ERROR]
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/surefire-reports
 for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-yarn-applications-distributedshell
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Updating HDFS-9234
Sending e-mails to: yarn-...@hadoop.apache.org
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any

[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4380:
-
Attachment: 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt

[~varun_saxena] attaching a log when the test fails. 

I use this simple script to reproduce some intermittent failures 
https://github.com/oza/failchecker

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Attachments: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022660#comment-15022660
 ] 

Tsuyoshi Ozawa commented on YARN-4385:
--

>From https://builds.apache.org/job/Hadoop-Yarn-trunk/1380/

{quote}

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 11262 lines...]
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime 
java.io.IOExcept...
  TestDistributedShellWithNodeLabels.setup:47 » YarnRuntime 
java.io.IOException:...

Tests run: 14, Failures: 0, Errors: 12, Skipped: 0

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop YARN  SUCCESS [  4.803 s]
[INFO] Apache Hadoop YARN API  SUCCESS [04:44 min]
[INFO] Apache Hadoop YARN Common . SUCCESS [03:31 min]
[INFO] Apache Hadoop YARN Server . SUCCESS [  0.109 s]
[INFO] Apache Hadoop YARN Server Common .. SUCCESS [ 57.348 s]
[INFO] Apache Hadoop YARN NodeManager  SUCCESS [10:05 min]
[INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [ 29.458 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [03:46 min]
[INFO] Apache Hadoop YARN ResourceManager  SUCCESS [  01:03 h]
[INFO] Apache Hadoop YARN Server Tests ... SUCCESS [01:52 min]
[INFO] Apache Hadoop YARN Client . SUCCESS [07:21 min]
[INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [ 32.136 s]
[INFO] Apache Hadoop YARN Applications ... SUCCESS [  0.053 s]
[INFO] Apache Hadoop YARN DistributedShell ... FAILURE [ 29.403 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SKIPPED
[INFO] Apache Hadoop YARN Site ... SKIPPED
[INFO] Apache Hadoop YARN Registry ... SKIPPED
[INFO] Apache Hadoop YARN Project  SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:37 h
[INFO] Finished at: 2015-11-09T20:36:25+00:00
[INFO] Final Memory: 81M/690M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on 
project hadoop-yarn-applications-distributedshell: There are test failures.
[ERROR]
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/surefire-reports
 for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-yarn-applications-distributedshell
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Updating HDFS-9234
Sending e-mails to: yarn-...@hadoop.apache.org
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
12 tests failed.
FAILED:  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithInvalidArgs

Error Message:
java.io.IOException: ResourceManager failed to start. Final state is STOPPED

Stack Trace:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: 
ResourceManager failed to start. Final state is STOPPED
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:331)
at

[jira] [Moved] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa moved HADOOP-12591 to YARN-4385:
---

Key: YARN-4385  (was: HADOOP-12591)
Project: Hadoop YARN  (was: Hadoop Common)

> TestDistributedShell times out
> --
>
> Key: YARN-4385
> URL: https://issues.apache.org/jira/browse/YARN-4385
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi Ozawa
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4385:
-
Component/s: test

> TestDistributedShell times out
> --
>
> Key: YARN-4385
> URL: https://issues.apache.org/jira/browse/YARN-4385
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Tsuyoshi Ozawa
> Attachments: 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4385) TestDistributedShell times out

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4385:
-
Attachment: 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt

Attaching a log when it fails. 

> TestDistributedShell times out
> --
>
> Key: YARN-4385
> URL: https://issues.apache.org/jira/browse/YARN-4385
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Tsuyoshi Ozawa
> Attachments: 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4380:
-
Attachment: 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt

[~varun_saxena], thank you for the fix. The fix itself looks good me. 

I got another error though it's rare to happen: 

{quote}
Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
  Time elapsed: 0.093 sec  <<< FAILURE!
org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
Argument(s) are different! Wanted:
eventHandler.handle(

);
-> at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
Actual invocation has different arguments:
eventHandler.handle(
EventType: APPLICATION_INITED
);
-> at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632)
{quote}

Attaching a log for the failure. Could you take a look?

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Varun Saxena
> Attachments: YARN-4380.01.patch, 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt,
>  
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt
>
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023870#comment-15023870
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

{quote}
Archiving artifacts
[description-setter] Description set: YARN-4348
Recording test results
ERROR: Publisher 'Publish JUnit test result report' failed: No test report 
files were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
An attempt to send an e-mail to empty list of recipients, ignored.
Finished: FAILURE
{quote}

Hmm, Jenkins looks to be unhealthy. 

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, 
> log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-23 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Attachment: YARN-4348-branch-2.7.003.patch

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, 
> log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently on branch-2.8

2015-11-22 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4380:
-
Summary: 
TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
intermittently on branch-2.8  (was: 
TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
on branch-2.8)

> TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
> intermittently on branch-2.8
> --
>
> Key: YARN-4380
> URL: https://issues.apache.org/jira/browse/YARN-4380
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Tsuyoshi Ozawa
>
> {quote}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
> testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
>   Time elapsed: 0.109 sec  <<< FAILURE!
> org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
> Argument(s) are different! Wanted:
> deletionService.delete(
> "user0",
> null,
> 
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> Actual invocation has different arguments:
> deletionService.delete(
> "user0",
> 
> /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
> );
> -> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails on branch-2.8

2015-11-22 Thread Tsuyoshi Ozawa (JIRA)

Tsuyoshi Ozawa created YARN-4380:


 Summary: 
TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails 
on branch-2.8
 Key: YARN-4380
 URL: https://issues.apache.org/jira/browse/YARN-4380
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 2.8.0
Reporter: Tsuyoshi Ozawa


{quote}
Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
  Time elapsed: 0.109 sec  <<< FAILURE!
org.mockito.exceptions.verification.junit.ArgumentsAreDifferent:
Argument(s) are different! Wanted:
deletionService.delete(
"user0",
null,

);
-> at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
Actual invocation has different arguments:
deletionService.delete(
"user0",

/home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42
);
-> at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322)
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-20 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4348:
-
Priority: Blocker  (was: Major)

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, YARN-4348.001.patch, 
> YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout

2015-11-20 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018159#comment-15018159
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

Found that this is caused by the lock ordering.

1. (In main thread of RM) locking ZKRMStateStore(startInternal) -> waiting for 
lock.await() 
2. ZK's eventThread: Got SyncConnected event from ZK -> Calling 
ForwardingWatcher#process -> processWatchEvent called, but ZKRMStateStore has 
been locked since 1
3. (In main thread of RM)  timeout and IOException -> unlocking 
ZKRMStateStore() -> the callback, processEvent, of sync is fired.

I will attach a patch to address this problem.

> ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of 
> zkSessionTimeout
> 
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, YARN-4348.001.patch, 
> YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4371) "yarn application -kill" should take multiple application ids

2015-11-18 Thread Tsuyoshi Ozawa (JIRA)

Tsuyoshi Ozawa created YARN-4371:


 Summary: "yarn application -kill" should take multiple application 
ids
 Key: YARN-4371
 URL: https://issues.apache.org/jira/browse/YARN-4371
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Tsuyoshi Ozawa


Currently we cannot pass multiple applications to "yarn application -kill" 
command.
The command should take multiple application ids at the same time.

I think it's straight forward to pass comma-separated ids if we can grant 
application ids don't contain any commas. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4371) "yarn application -kill" should take multiple application ids

2015-11-18 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4371:
-
Description: 
Currently we cannot pass multiple applications to "yarn application -kill" 
command. The command should take multiple application ids at the same time. 

Each entries should be separated with white-space like:
{code}
yarn application -kill application_1234_0001 application_1234_0007 
application_1234_0012
{code}

  was:
Currently we cannot pass multiple applications to "yarn application -kill" 
command. The command should take multiple application ids at the same time. 

Each entries should be separated with white-space like .


> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with white-space like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4371) "yarn application -kill" should take multiple application ids

2015-11-18 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4371:
-
Description: 
Currently we cannot pass multiple applications to "yarn application -kill" 
command. The command should take multiple application ids at the same time. 

Each entries should be separated with white-space like .

  was:
Currently we cannot pass multiple applications to "yarn application -kill" 
command.
The command should take multiple application ids at the same time.

I think it's straight forward to pass comma-separated ids if we can grant 
application ids don't contain any commas. 


> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with white-space like .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4371) "yarn application -kill" should take multiple application ids

2015-11-18 Thread Tsuyoshi Ozawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-4371:
-
Description: 
Currently we cannot pass multiple applications to "yarn application -kill" 
command. The command should take multiple application ids at the same time. 

Each entries should be separated with whitespace like:
{code}
yarn application -kill application_1234_0001 application_1234_0007 
application_1234_0012
{code}

  was:
Currently we cannot pass multiple applications to "yarn application -kill" 
command. The command should take multiple application ids at the same time. 

Each entries should be separated with white-space like:
{code}
yarn application -kill application_1234_0001 application_1234_0007 
application_1234_0012
{code}


> "yarn application -kill" should take multiple application ids
> -
>
> Key: YARN-4371
> URL: https://issues.apache.org/jira/browse/YARN-4371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Tsuyoshi Ozawa
>Assignee: Sunil G
>
> Currently we cannot pass multiple applications to "yarn application -kill" 
> command. The command should take multiple application ids at the same time. 
> Each entries should be separated with whitespace like:
> {code}
> yarn application -kill application_1234_0001 application_1234_0007 
> application_1234_0012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1652 matches

Mail list logo