date:20150604

[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573673#comment-14573673
 ] 

Wangda Tan commented on YARN-3769:
--

Thanks [~eepayne], I reassigned it to me, I will upload a design doc shortly 
for review.

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Wangda Tan

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-3769:


Assignee: Wangda Tan  (was: Eric Payne)

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Wangda Tan

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3768) Index out of range exception with environment variables without values

2015-06-04 Thread zhihai xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reassigned YARN-3768:
---

Assignee: zhihai xu

 Index out of range exception with environment variables without values
 --

 Key: YARN-3768
 URL: https://issues.apache.org/jira/browse/YARN-3768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.5.0
Reporter: Joe Ferner
Assignee: zhihai xu

 Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
 exception occurs if an environment variable is encountered without a value.
 I believe this occurs because java will not return empty strings from the 
 split method. Similar to this 
 http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-06-04 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573830#comment-14573830
]

Zhijie Shen commented on YARN-3051:
---

[~varun_saxena], thanks for working on the new patch. It seems to be a complete
reader side protype, which is nice. I still need some time to take thorough
look, but I'd like to my thoughts about the reader APIs.

IMHO, we may want to have or start with two sets of APIs: 1) the APIs to query
the raw data and 2) the APIs to query the aggregation data.

1) APIs to query the raw data:

We would like to have the APIs to let users zoom into the details about their
jobs, and give users the freedom to fetch the raw data and do the customized
process that ATS will not do. For example, Hive/Pig on Tez need this set of
APIs to get the framework specific data, process it and render it on their on
web UI. We basically need 2 such APIs.

a. Get a single entity given an ID that uniquely locates the entity in the
backend (We assume the uniqueness is assured somehow).
* This API can be extended or split into multiple sub-APIs to get a single
element of the entity, such as events, metrics and configuration.

b. Search for a set entities that match the given predicates.
* We can start from the predicates that we used in ATS v1 (also for the
compatibility purpose), but some of them may no longer apply.
* We may want to add more predicates to check the newly added element in v2.
* With more predefined semantics, we can even query entities that belong to
some container/attempt/application and so on.

2) APIs to query the aggregation data

These are complete new in v2 and are the advantage. With the aggregation, we
can answer some statistical questions about the job, the user, the queue, the
flow and the cluster. These APIs are not directing users to the individual
entities put by the application, but returning statistical data (carried by
Application|User|Queue|Flow|ClusterEntity).

a. Get certain level aggregation data given the ID of the concept on that
level, i.e., the job, the user, the queue, the flow and the cluster.

b. Search for the the jobs, the users, the queues, the flows and the clusters
given predicates.
* For the predicates, we could learn from the examples in hRaven.

[Storage abstraction] Create backing storage read interface for ATS readers
---

Key: YARN-3051
URL: https://issues.apache.org/jira/browse/YARN-3051
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Affects Versions: YARN-2928
Reporter: Sangjin Lee
Assignee: Varun Saxena
Attachments: YARN-3051-YARN-2928.003.patch,
YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch,
YARN-3051.wip.patch, YARN-3051_temp.patch

Per design in YARN-2928, create backing storage read interface that can be
implemented by multiple backing storage implementations.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-04 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573911#comment-14573911
 ] 

Jian He commented on YARN-2716:
---

Thanks Karthik for working this ! This simplifies things a lot. 
Mostly good, few comments and questions:
- these two booleans not used, maybe removed.
{{private boolean create = false, delete = false; }}
- is this going to be done in this jira?
{code} // TODO: Check deleting appIdRemovePath works recursively
safeDelete(appIdRemovePath);{code}
- will the safeDelete throw noNodeExist exception if deleting a non-existing 
zone?
- {{new RetryNTimes(numRetries, zkSessionTimeout / numRetries));}},  I think 
the second parameter should be zkRetryInterval; Also, I have a question why in 
HA case, zkRetryInterval is calculated as below
{code}
if (HAUtil.isHAEnabled(conf)) {   zkRetryInterval = zkSessionTimeout / 
numRetries;
{code}

- I found this 
[thread|http://mail-archives.apache.org/mod_mbox/curator-user/201410.mbox/%3cd076bc8e.9ef1%25sreichl...@chegg.com%3E]
 saying that blockUntilConnect is not needed to call; Suppose it’s needed, I 
think the zkSessionTimeout value is too small, it would be 
numRetries*numRetryInterval, otherwise RM will exit soon after retry 10s by 
default.
{code}
if (!curatorFramework.blockUntilConnected(
zkSessionTimeout, TimeUnit.MILLISECONDS)) {
  LOG.fatal(Couldn't establish connection to ZK server);
  throw new YarnRuntimeException(Couldn't connect to ZK server);
}
{code}
- remove this ?
{code}
//  @Override
//  public ZooKeeper getNewZooKeeper() throws IOException {
//return client;
//  }
{code}
-  I think testZKSessionTimeout may be removed too ? it looks like a test for 
curator 


 Refactor ZKRMStateStore retry code with Apache Curator
 --

 Key: YARN-2716
 URL: https://issues.apache.org/jira/browse/YARN-2716
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Karthik Kambatla
 Attachments: yarn-2716-1.patch, yarn-2716-prelim.patch, 
 yarn-2716-prelim.patch, yarn-2716-super-prelim.patch


 Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
 simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher

2015-06-04 Thread Wangda Tan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573878#comment-14573878
]

Wangda Tan commented on YARN-3508:
--

Trying to better understand this problem: I'm not sure where is bottleneck. If
CapacityScheduler becomes bottleneck, move preemption events out of main RM
dispatcher doesn't help. This approach only helps when main dispatcher is
bottleneck.

And a parallel thing we can do is to optimize number of preemption event.
Currently, if a container sits in to-preempt list, before it is get preempted,
one event will be sent to scheduler for every few seconds, we can reduce
frequency of this event.

Preemption processing occuring on the main RM dispatcher

Key: YARN-3508
URL: https://issues.apache.org/jira/browse/YARN-3508
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Varun Saxena
Attachments: YARN-3508.002.patch, YARN-3508.01.patch

We recently saw the RM for a large cluster lag far behind on the
AsyncDispacher event queue. The AsyncDispatcher thread was consistently
blocked on the highly-contended CapacityScheduler lock trying to dispatch
preemption-related events for RMContainerPreemptEventDispatcher. Preemption
processing should occur on the scheduler event dispatcher thread or a
separate thread to avoid delaying the processing of other events in the
primary dispatcher queue.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573670#comment-14573670
 ] 

Eric Payne commented on YARN-3769:
--

[~leftnoteasy]
bq. If you think it's fine, could I take a shot at it?
It sounds like it would work. It's fine with me if you want to work on that.

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-04 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573724#comment-14573724
 ] 

zhihai xu commented on YARN-3745:
-

[~lavkesh], thanks for working on this issue. This looks like a good catch.
One question about the patch, why retrying on SecurityException? Will retrying 
on NoSuchMethodException be enough?
If need retrying on SecurityException, Can we add a test case against it?
There is a typo in the comment {{This does not has constructor with String 
argument}}, should be {{have}} instead of {{has}}.
Also could we make the comment {{Try with String constructor if it fails try 
with default.}} clearer as
{{Try constructor with String argument, if it fails, try default.}}
Can we add some comment to explain why ClassNotFoundException is expected in 
the test?


 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-04 Thread zhihai xu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573593#comment-14573593
]

zhihai xu commented on YARN-3017:
-

Hi [~rohithsharma], thanks for the information.
Sorry, I am not familiar with rolling upgrade, Could you give a little more
detail about the possibility to break the rolling upgrade?
But I saw the ContainerId format is changed by YARN-2562 at 2.6.0 release eight
months ago, Compared to the change at YARN-2562, this patch is minor. Because
it only changes function {{ContainerId#toString}}, the current function
{{ContainerId##fromString}} supports both the current container string format
and the new container string format.
CC [~ozawa] for the impact of ContainerId format change.

ContainerID in ResourceManager Log Has Slightly Different Format From
AppAttemptID
--

Key: YARN-3017
URL: https://issues.apache.org/jira/browse/YARN-3017
Project: Hadoop YARN
Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: MUFEED USMAN
Priority: Minor
Labels: PatchAvailable
Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch

Not sure if this should be filed as a bug or not.
In the ResourceManager log in the events surrounding the creation of a new
application attempt,
...
...
2014-11-14 17:45:37,258 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
masterappattempt_1412150883650_0001_02
...
...
The application attempt has the ID format _1412150883650_0001_02.
Whereas the associated ContainerID goes by _1412150883650_0001_02_.
...
...
2014-11-14 17:45:37,260 INFO
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting
up
container Container: [ContainerId: container_1412150883650_0001_02_01,
NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: memory:2048,
vCores:1,
disks:0.0, Priority: 0, Token: Token { kind: ContainerToken, service:
10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
...
...
Curious to know if this is kept like that for a reason. If not while using
filtering tools to, say, grep events surrounding a specific attempt by the
numeric ID part information may slip out during troubleshooting.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573667#comment-14573667
 ] 

Wangda Tan commented on YARN-3769:
--

[~eepayne], Exactly.

 Preemption occurring unnecessarily because preemption doesn't consider user 
 limit
 -

 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.6.0, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne

 We are seeing the preemption monitor preempting containers from queue A and 
 then seeing the capacity scheduler giving them immediately back to queue A. 
 This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-04 Thread zhihai xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573745#comment-14573745
 ] 

zhihai xu commented on YARN-3745:
-

Sorry, there's one more thing I forgot to mention, Can we rename 
{{initExceptionWithConstructor}} to instantiateExceptionImpl?

 SerializedException should also try to instantiate internal exception with 
 the default constructor
 --

 Key: YARN-3745
 URL: https://issues.apache.org/jira/browse/YARN-3745
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lavkesh Lahngir
Assignee: Lavkesh Lahngir
 Attachments: YARN-3745.1.patch, YARN-3745.patch


 While deserialising a SerializedException it tries to create internal 
 exception in instantiateException() with cn = 
 cls.getConstructor(String.class).
 if cls does not has a constructor with String parameter it throws 
 Nosuchmethodexception
 for example ClosedChannelException class.  
 We should also try to instantiate exception with default constructor so that 
 inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)

Eric Payne created YARN-3769:


 Summary: Preemption occurring unnecessarily because preemption 
doesn't consider user limit
 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.0, 2.6.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne


We are seeing the preemption monitor preempting containers from queue A and 
then seeing the capacity scheduler giving them immediately back to queue A. 
This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

91 matches

Mail list logo