[jira] [Created] (YARN-10698) Backport YARN-1151 (load auxiliary service from HDFS archives) to branch-2

2021-03-16 Thread Haibo Chen (Jira)
Haibo Chen created YARN-10698:
-

 Summary: Backport YARN-1151 (load auxiliary service from HDFS 
archives) to branch-2
 Key: YARN-10698
 URL: https://issues.apache.org/jira/browse/YARN-10698
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10651) CapacityScheduler crashed with NPE in AbstractYarnScheduler.updateNodeResource()

2021-02-24 Thread Haibo Chen (Jira)
Haibo Chen created YARN-10651:
-

 Summary: CapacityScheduler crashed with NPE in 
AbstractYarnScheduler.updateNodeResource() 
 Key: YARN-10651
 URL: https://issues.apache.org/jira/browse/YARN-10651
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Haibo Chen
Assignee: Haibo Chen


{code:java}
2021-02-24 17:07:39,798 FATAL org.apache.hadoop.yarn.event.EventDispatcher: 
Error in handling event type NODE_RESOURCE_UPDATE to the Event Dispatcher
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNodeResource(AbstractYarnScheduler.java:809)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeAndQueueResource(CapacityScheduler.java:1116)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1505)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154){code}

at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:748)
2021-02-24 17:07:39,798 INFO org.apache.hadoop.yarn.event.EventDispatcher: 
Exiting, bbye..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10467) ContainerIdPBImpl objects can be leaked in RMNodeImpl.completedContainers

2020-10-19 Thread Haibo Chen (Jira)
Haibo Chen created YARN-10467:
-

 Summary: ContainerIdPBImpl objects can be leaked in 
RMNodeImpl.completedContainers
 Key: YARN-10467
 URL: https://issues.apache.org/jira/browse/YARN-10467
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.10.0
Reporter: Haibo Chen
Assignee: Haibo Chen


In one of our recent heap analysis, we found that the majority of the heap is 
occupied by {{RMNodeImpl.completedContainers}}, which 
accounts for 19GB, out of 24.3 GB.  There are over 86 million ContainerIdPBImpl 
objects, in contrast, only 161,601 RMContainerImpl objects which represent the 
# of active containers that RM is still tracking.  Inspecting some 
ContainerIdPBImpl objects, they belong to applications that have long finished. 
This indicates some sort of memory leak of ContainerIdPBImpl objects in 
RMNodeImpl.

 

Right now, when a container is reported by a NM as completed, it is immediately 
added to RMNodeImpl.completedContainers and later cleaned up after the AM has 
been notified of its completion in the AM-RM heartbeat. The cleanup can be 
broken into a few steps.
 * Step 1:  the completed container is first added to 
RMAppAttemptImpl.justFinishedContainers (this is asynchronous to being added to 
{{RMNodeImpl.completedContainers}}).
 * Step 2: During the heartbeat AM-RM heartbeat, the container is removed from 
RMAppAttemptImpl.justFinishedContainers and added to 
RMAppAttemptImpl.finishedContainersSentToAM

Once a completed container gets added to 
RMAppAttemptImpl.finishedContainersSentToAM, it is guaranteed to be cleaned up 
from {{RMNodeImpl.completedContainers}}

 

However, if the AM exits (regardless of failure or success) before some 
recently completed containers can be added to  
RMAppAttemptImpl.finishedContainersSentToAM in previous heartbeats, there won’t 
be any future AM-RM heartbeat to perform aforementioned step 2. Hence, these 
objects stay in RMNodeImpl.completedContainers forever.

We have observed in MR that AMs can decide to exit upon success of all it tasks 
without waiting for notification of the completion of every container, or AM 
may just die suddenly (e.g. OOM).  Spark and other framework may just be 
similar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9111) NM crashes because Fair scheduler promotes a container that has not been pulled by AM

2018-12-11 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-9111:


 Summary: NM crashes because Fair scheduler promotes a container 
that has not been pulled by AM
 Key: YARN-9111
 URL: https://issues.apache.org/jira/browse/YARN-9111
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler, nodemanager
Affects Versions: YARN-1011
Reporter: Haibo Chen


{code:java}
2018-10-19 22:34:35,052 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread
 java.lang.NullPointerException
 at 
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:323)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.handle(ContainerManagerImpl.java:1649)
 at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.handle(ContainerManagerImpl.java:185)
 at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
 at java.lang.Thread.run(Thread.java:748)
 2018-10-19 22:34:35,054 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
Exiting, bbye..
 2018-10-19 22:34:35,059 DEBUG org.apache.hadoop.service.AbstractService: 
Service: NodeManager entered state STOPPED{code}
 

 
When a container is allocated by RM to an application, its container token is 
not generated until the AM pulls that container from RM.

However, it the scheduler decides to promote that container before it is pulled 
by the AM, it does not have container token to work with.

The current code does not update/generate the container token as such. When 
container promotion is sent to NM to process, the NM crashes on NPE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9066) Depreciate Fair Scheduler min share

2018-11-27 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-9066:


 Summary: Depreciate Fair Scheduler min share
 Key: YARN-9066
 URL: https://issues.apache.org/jira/browse/YARN-9066
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.2.0
Reporter: Haibo Chen
 Attachments: Proposal_Deprecate_FS_Min_Share.pdf

See the attached docs for details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9026) DefaultOOMHandler should mark preempted containers as killed

2018-11-15 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-9026:


 Summary: DefaultOOMHandler should mark preempted containers as 
killed
 Key: YARN-9026
 URL: https://issues.apache.org/jira/browse/YARN-9026
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.2.1
Reporter: Haibo Chen


DefaultOOMHandler today kills a selected container by sending kill -9 signal to 
all processes running within the container cgroup.

The container would exit with a non-zero code, and hence treated as a failure 
by ContainerLaunch threads.

We should instead mark the containers as killed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8992) Fair scheduler can delete a dynamic queue while an application attempt is being added to the queue

2018-11-08 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8992:


 Summary: Fair scheduler can delete a dynamic queue while an 
application attempt is being added to the queue
 Key: YARN-8992
 URL: https://issues.apache.org/jira/browse/YARN-8992
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.1.1
Reporter: Haibo Chen


QueueManager can see a leaf queue being empty while FSLeafQueue.addApp() is 
called in the middle of  
{code:java}
return queue.getNumRunnableApps() == 0 &&
  leafQueue.getNumNonRunnableApps() == 0 &&
  leafQueue.getNumAssignedApps() == 0;{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8930) CGroup-based strict container memory management does not work with CGroupElasticMemoryController

2018-10-22 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8930:


 Summary: CGroup-based strict container memory management does not 
work with CGroupElasticMemoryController
 Key: YARN-8930
 URL: https://issues.apache.org/jira/browse/YARN-8930
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: Haibo Chen
Assignee: Haibo Chen


When yarn.nodemanger.resource.memory.enforced is set to true with memory cgroup 
turned on, (aka strict memory enforcement), containers monitor relies on the 
under_oom status read from the container cgroup's memory.oom_control file.

However, when the root yarn container cgroup is under oom (e.g. when the node 
is overallocating iteself), the under_oom status is set for all yarn containers 
regardless of whether each individual container has run over its memory limit.

What essentially happens is that whenever the root cgroup is under oom, all 
yarn containers are killed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8929) DefaultOOMHandler should only pick running containers to kill upon oom events

2018-10-22 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8929:


 Summary: DefaultOOMHandler should only pick running containers to 
kill upon oom events
 Key: YARN-8929
 URL: https://issues.apache.org/jira/browse/YARN-8929
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: Haibo Chen
Assignee: Haibo Chen


DefaultOOMHandler currently currently sort all known containers based on the 
execution type primarily and their start time secondarily.

However, it does not check if a container is running or not.  Kill a 
non-running container will not release any memory, hence won't get us of the 
under-oom status.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8921) SnapshotBasedOverAllocationPolicy incorrectly rounds memory availabe int bytes

2018-10-19 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8921:


 Summary: SnapshotBasedOverAllocationPolicy incorrectly rounds 
memory availabe int bytes
 Key: YARN-8921
 URL: https://issues.apache.org/jira/browse/YARN-8921
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-1011
Reporter: Haibo Chen
Assignee: Haibo Chen


The memory overallocate threshold is a float, so is 
(overAllocationThresholds.getMemoryThreshold() *
containersMonitor.getPmemAllocatedForContainers()). Because Math.round(float) 
return an int, this would cap effectively the amount of memory available for 
overallocation to Integer.MAX_VALUE, [see the code at 
here|https://github.com/apache/hadoop/blob/fa864b8744cfdfe613a917ba1bbd859a5b6f70b8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/scheduler/SnapshotBasedOverAllocationPolicy.java#L45]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8911) NM incorrectly account for container cpu utilization by their number of vcores

2018-10-18 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8911:


 Summary: NM incorrectly account for container cpu utilization by 
their number of vcores
 Key: YARN-8911
 URL: https://issues.apache.org/jira/browse/YARN-8911
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen


ResourceUtilization represents the cpu utilization with a float number in [0, 
1.0], i.e. the percentage of cpu usage across the node.  However, when 
Containers Monitor tracks the total aggregate resource utilization of all 
containers, it adds up the total number of vcores used by all running 
containers.

See [the 
code|https://github.com/apache/hadoop/blob/beb850d8f7f1fefa7a6d9502df2b4a4eea372523/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java#L672]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8874) NM does not do any authorization in ContainerManagerImpl.signalToContainer()

2018-10-12 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8874:


 Summary: NM does not do any authorization in 
ContainerManagerImpl.signalToContainer()
 Key: YARN-8874
 URL: https://issues.apache.org/jira/browse/YARN-8874
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: Haibo Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8864) NM incorrectly logs container user as the user who sent a stop container request in its audit log

2018-10-09 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8864:


 Summary: NM incorrectly logs container user as the user who sent a 
stop container request in its audit log
 Key: YARN-8864
 URL: https://issues.apache.org/jira/browse/YARN-8864
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: Haibo Chen


As in  ContainerManagerImpl.java

    

protected void stopContainerInternal(ContainerId containerID)
  throws YarnException, IOException {
    ...
 
  NMAuditLogger.logSuccess(container.getUser(),    
    AuditConstants.STOP_CONTAINER, "ContainerManageImpl", containerID
  .getApplicationAttemptId().getApplicationId(), containerID);
    }
  }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8813) Improve debug messages for NM preemption of OPPORTUNISTIC containers

2018-09-21 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8813:


 Summary: Improve debug messages for NM preemption of 
OPPORTUNISTIC containers
 Key: YARN-8813
 URL: https://issues.apache.org/jira/browse/YARN-8813
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-1011
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8809) Fair Scheduler does not decrement queue metrics when OPPORTUNISTIC containers are released.

2018-09-20 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8809:


 Summary: Fair Scheduler does not decrement queue metrics when 
OPPORTUNISTIC containers are released.
 Key: YARN-8809
 URL: https://issues.apache.org/jira/browse/YARN-8809
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Affects Versions: YARN-1011
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8808) Use aggregate container utilization instead of node utilization to determine resources available for oversubscription

2018-09-20 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8808:


 Summary: Use aggregate container utilization instead of node 
utilization to determine resources available for oversubscription
 Key: YARN-8808
 URL: https://issues.apache.org/jira/browse/YARN-8808
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: YARN-1011
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8807) FairScheduler crashes RM with oversubscription turned on if an application is killed.

2018-09-20 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8807:


 Summary: FairScheduler crashes RM with oversubscription turned on 
if an application is killed.
 Key: YARN-8807
 URL: https://issues.apache.org/jira/browse/YARN-8807
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler, resourcemanager
Affects Versions: YARN-1011
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8398) Findbugs warning IS2_INCONSISTENT_SYNC in AllocationFileLoaderService.reloadListener

2018-08-24 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-8398.
--
Resolution: Duplicate

> Findbugs warning IS2_INCONSISTENT_SYNC in 
> AllocationFileLoaderService.reloadListener
> 
>
> Key: YARN-8398
> URL: https://issues.apache.org/jira/browse/YARN-8398
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Sunil Govindan
>Priority: Major
>
> {code:java}
> Inconsistent synchronization of 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadListener;
>  locked 75% of time Bug type IS2_INCONSISTENT_SYNC (click for details)  In 
> class 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService
>  Field 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadListener
>  Synchronized 75% of the time Unsynchronized access at 
> AllocationFileLoaderService.java:[line 117] Synchronized access at 
> AllocationFileLoaderService.java:[line 212] Synchronized access at 
> AllocationFileLoaderService.java:[line 228] Synchronized access at 
> AllocationFileLoaderService.java:[line 269]{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8590) Fair scheduler promotion does not update container execution type and token

2018-07-26 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8590:


 Summary: Fair scheduler promotion does not update container 
execution type and token
 Key: YARN-8590
 URL: https://issues.apache.org/jira/browse/YARN-8590
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Affects Versions: YARN-1011
Reporter: Haibo Chen
Assignee: Haibo Chen


Fair Scheduler promotion of opportunistic containers does not update container 
execution type and token. This leads to incorrect resource accounting when the 
promoted containers are released.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8461) Support strict memory control on individual container with elastic control memory mechanism

2018-06-25 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8461:


 Summary: Support strict memory control on individual container 
with elastic control memory mechanism
 Key: YARN-8461
 URL: https://issues.apache.org/jira/browse/YARN-8461
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: Haibo Chen
Assignee: Haibo Chen


YARN-4599 adds elastic memory control that disables oom killer for the root 
container cgroup. Hence, all containers have their oom killer disabled because 
they inherit the setting from the root container cgroup. Hence, when strict 
memory control on individual containers is also enabled, the container will be 
frozen but not killed. We can let the container monitoring thread to take care 
of the frozen containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8427) Don't start opportunistic containers at container scheduler/finish event with over-allocation

2018-06-14 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8427:


 Summary: Don't start opportunistic containers at container 
scheduler/finish event with over-allocation
 Key: YARN-8427
 URL: https://issues.apache.org/jira/browse/YARN-8427
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen


As discussed in YARN-8250, we can stop opportunistic containers from being 
launched at container scheduler/finish events if the node is already 
over-allocating itself.  This can mitigate the issue that too many 
opportunistic containers can be launched and then quickly killed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6800) Add opportunity to start containers while periodically checking for preemption

2018-06-12 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-6800.
--
   Resolution: Implemented
Fix Version/s: YARN-1011

Resolving this as it is already done as part of YARN-6675.

> Add opportunity to start containers while periodically checking for preemption
> --
>
> Key: YARN-6800
> URL: https://issues.apache.org/jira/browse/YARN-6800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Fix For: YARN-1011
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8393) timeline flow runs API createdtimestart/createdtimeend parameter does not work

2018-06-04 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8393:


 Summary: timeline flow runs API createdtimestart/createdtimeend 
parameter does not work
 Key: YARN-8393
 URL: https://issues.apache.org/jira/browse/YARN-8393
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelinereader
Affects Versions: 3.2.0
Reporter: Haibo Chen


[http://vijayk-ats-4.gce.cloudera.com:8188/ws/v2/timeline/users/systest/flows/flow1/runs]
 output:
{code:java}
[{"metrics":[],"events":[],"createdtime":1516405275543,"idprefix":0,"id":"systest@flow1/12342","info":{"UID":"aftc!systest!flow1!12342","SYSTEM_INFO_FLOW_RUN_END_TIME":151640562,"SYSTEM_INFO_FLOW_NAME":"flow1","SYSTEM_INFO_FLOW_RUN_ID":12342,"SYSTEM_INFO_USER":"systest","FROM_ID":"aftc!systest!flow1!12342"},"isrelatedto":{},"relatesto":{},"type":"YARN_FLOW_RUN"},{"metrics":[],"events":[],"createdtime":1516223999363,"idprefix":0,"id":"systest@flow1/12341","info":{"UID":"aftc!systest!flow1!12341","SYSTEM_INFO_FLOW_RUN_END_TIME":1516405586650,"SYSTEM_INFO_FLOW_NAME":"flow1","SYSTEM_INFO_FLOW_RUN_ID":12341,"SYSTEM_INFO_USER":"systest","FROM_ID":"aftc!systest!flow1!12341"},"isrelatedto":{},"relatesto":{},"type":"YARN_FLOW_RUN"}]
{code}
createdtimestart parameter call(used the run that had higher timestamp so that 
the other run gets filtered out):
 
[http://vijayk-ats-4.gce.cloudera.com:8188/ws/v2/timeline/users/systest/flows/flow1/runs?createdtimestart=1516405275543]

But the output didn't get filtered out.

When trying with a even higher timestamp, the expectation was that both the 
runs get filtered out. But only one got filtered out at this value.
 
[http://vijayk-ats-4.gce.cloudera.com:8188/ws/v2/timeline/users/systest/flows/flow1/runs?createdtimestart=1516405585543]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8391) Investigate AllocationFileLoaderService.reloadListener locking issue

2018-06-04 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8391:


 Summary: Investigate AllocationFileLoaderService.reloadListener 
locking issue
 Key: YARN-8391
 URL: https://issues.apache.org/jira/browse/YARN-8391
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.2.0
Reporter: Haibo Chen


Per findbugs report in YARN-8390, there is some inconsistent locking of  
reloadListener



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8388) TestCGroupElasticMemoryController.testNormalExit() hangs on Linux

2018-06-01 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8388:


 Summary: TestCGroupElasticMemoryController.testNormalExit() hangs 
on Linux
 Key: YARN-8388
 URL: https://issues.apache.org/jira/browse/YARN-8388
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.2.0
Reporter: Haibo Chen
Assignee: Miklos Szegedi


YARN-8375 disables the unit test on Linux. But given that we will be running 
the CGroupElasticMemoryController on Linux, we need to figure out why it is 
hanging and ideally fix it.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8325) Miscellaneous QueueManager code clean up

2018-05-18 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8325:


 Summary: Miscellaneous QueueManager code clean up
 Key: YARN-8325
 URL: https://issues.apache.org/jira/browse/YARN-8325
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Haibo Chen


getLeafQueue(String name, boolean create, boolean recomputeSteadyShares) and

getLeafQueue(String name,  boolean create,  boolean recomputeSteadyShares),  
setChildResourceLimits() can be declared as private.

 ensureRootPrefix() should be static.

The static LOG field should be private.

The log in

  if (!parent.getPolicy().isChildPolicyAllowed(childPolicy)) {
    LOG.error("Can't create queue '" + queueName + "'.");
    return null;
  }

can be improved by the specific reason, ie. the child scheduling policy is not 
allowed by that of its parent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8323) FairScheduler.allocConf should be declared as volatile

2018-05-18 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8323:


 Summary: FairScheduler.allocConf should be declared as volatile
 Key: YARN-8323
 URL: https://issues.apache.org/jira/browse/YARN-8323
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Haibo Chen


allocConf is updated by the allocation file reloading thread, but it is not 
declared as volatile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8322) Change log level when there is an IOException when the allocation file is loaded

2018-05-18 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8322:


 Summary: Change log level when there is an IOException when the 
allocation file is loaded
 Key: YARN-8322
 URL: https://issues.apache.org/jira/browse/YARN-8322
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Haibo Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8321) AllocationFileLoaderService. getAllocationFile() should be declared as VisibleForTest

2018-05-18 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8321:


 Summary: AllocationFileLoaderService. getAllocationFile() should 
be declared as VisibleForTest
 Key: YARN-8321
 URL: https://issues.apache.org/jira/browse/YARN-8321
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0
Reporter: Haibo Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8240) Add queue-level control to allow all applications in a queue to opt-out

2018-05-01 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8240:


 Summary: Add queue-level control to allow all applications in a 
queue to opt-out
 Key: YARN-8240
 URL: https://issues.apache.org/jira/browse/YARN-8240
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7088) Add application launch time to Resource Manager REST API

2018-04-17 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-7088.
--
Resolution: Fixed

> Add application launch time to Resource Manager REST API
> 
>
> Key: YARN-7088
> URL: https://issues.apache.org/jira/browse/YARN-7088
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Abdullah Yousufi
>Assignee: Kanwaljeet Sachdev
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7088.001.patch, YARN-7088.002.patch, 
> YARN-7088.003.patch, YARN-7088.004.patch, YARN-7088.005.patch, 
> YARN-7088.006.patch, YARN-7088.007.patch, YARN-7088.008.patch, 
> YARN-7088.009.patch, YARN-7088.010.patch, YARN-7088.011.patch, 
> YARN-7088.012.patch, YARN-7088.013.patch, YARN-7088.014.patch, 
> YARN-7088.015.patch, YARN-7088.016.patch, YARN-7088.017.patch
>
>
> Currently, the start time in the old and new UI actually shows the app 
> submission time. There should actually be two different fields; one for the 
> app's submission and one for its launch, as well as the elapsed pending time 
> between the two.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7213) [Umbrella] Test and validate HBase-2.0.x with Atsv2

2018-04-04 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-7213.
--
   Resolution: Done
Fix Version/s: 3.2.0

> [Umbrella] Test and validate HBase-2.0.x with Atsv2
> ---
>
> Key: YARN-7213
> URL: https://issues.apache.org/jira/browse/YARN-7213
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Rohith Sharma K S
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7213.prelim.patch, YARN-7213.prelim.patch, 
> YARN-7213.wip.patch
>
>
> Hbase-2.0.x officially support hadoop-alpha compilations. And also they are 
> getting ready for Hadoop-beta release so that HBase can release their 
> versions compatible with Hadoop-beta. So, this JIRA is to keep track of 
> HBase-2.0 integration issues. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8093) Support HBase 2.0.0-beta1 as ATSv2 backend

2018-03-29 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-8093.
--
Resolution: Duplicate

> Support HBase 2.0.0-beta1 as ATSv2 backend
> --
>
> Key: YARN-8093
> URL: https://issues.apache.org/jira/browse/YARN-8093
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineservice
>Affects Versions: 3.1.0
>Reporter: Haibo Chen
>Priority: Major
>
> This serves as a symbolic link under YARN-7055, to YARN-7213 where all 
> sub-tasks are included.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8093) Support HBase 2.0.0-beta1 as ATSv2 backend

2018-03-29 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8093:


 Summary: Support HBase 2.0.0-beta1 as ATSv2 backend
 Key: YARN-8093
 URL: https://issues.apache.org/jira/browse/YARN-8093
 Project: Hadoop YARN
  Issue Type: Task
  Components: timelineservice
Affects Versions: 3.1.0
Reporter: Haibo Chen


This serves as a symbolic link under YARN-7055, to YARN-7213 where all 
sub-tasks are included.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8089) Recover domain from backend for TimelineCollector.

2018-03-29 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8089:


 Summary: Recover domain from backend for TimelineCollector.
 Key: YARN-8089
 URL: https://issues.apache.org/jira/browse/YARN-8089
 Project: Hadoop YARN
  Issue Type: Task
  Components: timelineservice
Reporter: Haibo Chen


In-memory domain information is volatile. Hence, we need to store domain in the 
backend, and recover when the TimelineCollector comes up again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8088) [atsv2 read acls] REST API to get domain info by id

2018-03-29 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8088:


 Summary: [atsv2 read acls] REST API to get domain info by id
 Key: YARN-8088
 URL: https://issues.apache.org/jira/browse/YARN-8088
 Project: Hadoop YARN
  Issue Type: Task
  Components: timelineservice
Reporter: Haibo Chen


support query /domains/${domain_id}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8087) Allow YARN ATSv2 ACLs to be disabled

2018-03-29 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8087:


 Summary: Allow YARN ATSv2 ACLs to be disabled
 Key: YARN-8087
 URL: https://issues.apache.org/jira/browse/YARN-8087
 Project: Hadoop YARN
  Issue Type: Task
  Components: timelineservice
Affects Versions: 3.1.0
Reporter: Haibo Chen


YARN-3895 supports ACLs in ATSv2. We should allow admins to disable ACLs if 
they decide that they do not need such feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7859) New feature: add queue scheduling deadLine in fairScheduler.

2018-03-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-7859.
--
Resolution: Won't Do

> New feature: add queue scheduling deadLine in fairScheduler.
> 
>
> Key: YARN-7859
> URL: https://issues.apache.org/jira/browse/YARN-7859
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Affects Versions: 3.0.0
>Reporter: wangwj
>Assignee: wangwj
>Priority: Major
>  Labels: fairscheduler, features, patch
> Attachments: YARN-7859-v1.patch, YARN-7859-v2.patch, log, 
> screenshot-1.png, screenshot-3.png
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
>  As everyone knows.In FairScheduler the phenomenon of queue scheduling 
> starvation often occurs when the number of cluster jobs is large.The App in 
> one or more queue are pending.So I have thought a way to solve this 
> problem.Add queue scheduling deadLine in fairScheduler.When a queue is not 
> scheduled for FairScheduler within a specified time.We mandatory scheduler it!
> On the basis of the above, I propose this issue...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-3988) DockerContainerExecutor should allow user specify "docker run" parameters

2018-03-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-3988.
--
Resolution: Won't Fix

Closing this as DockerContainerExecutor has been deprecated in branch-2 and 
removed in trunk

> DockerContainerExecutor should allow user specify "docker run" parameters
> -
>
> Key: YARN-3988
> URL: https://issues.apache.org/jira/browse/YARN-3988
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Chen He
>Assignee: Chen He
>Priority: Major
>
> In current DockerContainerExecutor, the "docker run" command has fixed 
> parameters:
> String commandStr = commands.append(dockerExecutor)
>   .append(" ")
>   .append("run")
>   .append(" ")
>   .append("--rm --net=host")
>   .append(" ")
>   .append(" --name " + containerIdStr)
>   .append(localDirMount)
>   .append(logDirMount)
>   .append(containerWorkDirMount)
>   .append(" ")
>   .append(containerImageName)
>   .toString();
> For example, it is not flexible if users want to start a docker container 
> with attaching extra volume(s) and other "docker run" parameters. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2478) Nested containers should be supported

2018-03-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-2478.
--
Resolution: Won't Fix

Closing this as DockerContainerExecutor has been deprecated in branch-2 and 
removed in trunk

> Nested containers should be supported
> -
>
> Key: YARN-2478
> URL: https://issues.apache.org/jira/browse/YARN-2478
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abin Shahab
>Priority: Major
>
> Currently DockerContainerExecutor only supports one level of containers. 
> However, YARN's responsibility is to handle resource isolation, and nested 
> containers would allow YARN to delegate handling software isolation to the 
> jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2479) DockerContainerExecutor must support handling of distributed cache

2018-03-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-2479.
--
Resolution: Won't Fix

Closing this as DockerContainerExecutor has been deprecated in branch-2 and 
removed in trunk

> DockerContainerExecutor must support handling of distributed cache
> --
>
> Key: YARN-2479
> URL: https://issues.apache.org/jira/browse/YARN-2479
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abin Shahab
>Priority: Major
>  Labels: security
>
> Interaction between Docker containers and distributed cache has not yet been 
> worked out. There should be a way to securely access distributed cache 
> without compromising the isolation Docker provides.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2482) DockerContainerExecutor configuration

2018-03-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-2482.
--
Resolution: Won't Fix

Closing this as DockerContainerExecutor has been deprecated in branch-2 and 
removed in trunk

> DockerContainerExecutor configuration
> -
>
> Key: YARN-2482
> URL: https://issues.apache.org/jira/browse/YARN-2482
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abin Shahab
>Priority: Major
>  Labels: security
>
> Currently DockerContainerExecutor can be configured from yarn-site.xml, and 
> users can add arbtrary arguments to the container launch command. This should 
> be fixed so that the cluster and other jobs are protected from malicious 
> string injections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2477) DockerContainerExecutor must support secure mode

2018-03-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-2477.
--
Resolution: Won't Fix

Closing this as DockerContainerExecutor has been deprecated in branch-2 and 
removed in trunk.

> DockerContainerExecutor must support secure mode
> 
>
> Key: YARN-2477
> URL: https://issues.apache.org/jira/browse/YARN-2477
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Abin Shahab
>Priority: Major
>  Labels: security
>
> DockerContainerExecutor(patch in YARN-1964) does not support Kerberized 
> hadoop clusters yet, as Kerberized hadoop cluster has a strict dependency on 
> the LinuxContainerExecutor. 
> For Docker containers to be used in production environment, they must support 
> secure hadoop. Issues regarding Java's AES encryption library in a 
> containerized environment also need to be worked out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8042) Improve debugging on ATSv2 reader server

2018-03-17 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8042:


 Summary: Improve debugging on ATSv2 reader server
 Key: YARN-8042
 URL: https://issues.apache.org/jira/browse/YARN-8042
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Haibo Chen


It's been inconvenient to debug issues that happened on the read path. 
Typically, a query sent from a client is parsed into a TimelineReaderContext, 
TimelineEntityFilters  and TimelineDataToRetrieve  which are independent of the 
underlying backend storage implementations.    Then the general ATSv2 building 
blocks are then translated into a SCAN and GET query in HBase with specified 
row keys and filters.

To facilitate easy debugging, additional debug level logging messages (ideally 
that can be enabled dynamically without restarting TimelineReaderServer 
process) can be added at the boundary to narrow down the scope of investigation.

A good example of this is logging the scan or get query before it is sent to 
HBase and the result after the query returns from HBase. YARN support folks who 
are not necessarily HBase experts can present the debug messages to HBase 
experts and get help. (I had to remotely connect to TimelineReaderServer, set 
up breakpoints and get the hbase queries every time I suspect there is a bug in 
HBase)

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8038) Support data retention policy in YARN ATSv2

2018-03-16 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8038:


 Summary: Support data retention policy in YARN ATSv2
 Key: YARN-8038
 URL: https://issues.apache.org/jira/browse/YARN-8038
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineservice
Affects Versions: 3.0.0
Reporter: Haibo Chen


The data stored today in ATSv2 is either system data in that it is generated by 
YARN, or custom data that is generated by Application Masters themselves.

Data retention policy is necessary to maintain feature parity between the new 
MR JHS with the current JHS.

We may want to provide separate policies for system data and custom data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8003) Backport the code structure changes in YARN-7346 to branch-2

2018-03-06 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-8003:


 Summary: Backport the code structure changes in YARN-7346 to 
branch-2
 Key: YARN-8003
 URL: https://issues.apache.org/jira/browse/YARN-8003
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.10.0
Reporter: Haibo Chen
Assignee: Haibo Chen


As discussed in YARN-7346, we want to keep the ATSv2 source code structure in 
branch-2 close to that in trunk, in order to ease any future backport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7919) Split timelineservice-hbase module to make YARN-7346 easier

2018-02-10 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7919:


 Summary: Split timelineservice-hbase module to make YARN-7346 
easier
 Key: YARN-7919
 URL: https://issues.apache.org/jira/browse/YARN-7919
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineservice
Affects Versions: 3.0.0
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7888) container-log4j.properties is in hadoop-yarn-node-manager.jar

2018-02-02 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7888:


 Summary: container-log4j.properties is in 
hadoop-yarn-node-manager.jar
 Key: YARN-7888
 URL: https://issues.apache.org/jira/browse/YARN-7888
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Haibo Chen


NM sets up log4j for containers with the container-log4j.properties file in its 
own jar. However, ideally we should not expose server side jars to containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7837) TestRMWebServiceAppsNodelabel.testAppsRunning is failing

2018-01-31 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-7837.
--
Resolution: Duplicate

> TestRMWebServiceAppsNodelabel.testAppsRunning is failing
> 
>
> Key: YARN-7837
> URL: https://issues.apache.org/jira/browse/YARN-7837
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Haibo Chen
>Priority: Major
>
> org.junit.ComparisonFailure: partition amused 
>  Expected :\{"memory":1024,"vCores":1}
>  Actual   
> :{"memory":1024,"vCores":1,"resourceInformations":{"resourceInformation":[
> {"maximumAllocation":9223372036854775807,"minimumAllocation":0,"name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":1024}
> ,\{"maximumAllocation":9223372036854775807,"minimumAllocation":0,"name":"vcores","resourceType":"COUNTABLE","units":"","value":1}]}}
>     
>  
>  
>     at org.junit.Assert.assertEquals(Assert.java:115)
>      at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel.verifyResource(TestRMWebServiceAppsNodelabel.java:218)
>      at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel.testAppsRunning(TestRMWebServiceAppsNodelabel.java:201)
>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>      at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>      at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>      at java.lang.reflect.Method.invoke(Method.java:497)
>      at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>      at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>      at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>      at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>      at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>      at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>      at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>      at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>      at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>      at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>      at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
>      at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
>      at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
>      at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
>      at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>      at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
>      at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>      at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>      at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>      at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7852) FlowRunReader constructs min_start_time filter for both createdtimestart and createdtimeend.

2018-01-30 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-7852.
--
Resolution: Invalid

> FlowRunReader constructs min_start_time filter for both createdtimestart and 
> createdtimeend.
> 
>
> Key: YARN-7852
> URL: https://issues.apache.org/jira/browse/YARN-7852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Affects Versions: 3.0.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
>
> {code:java}
> protected FilterList constructFilterListBasedOnFilters() throws IOException {
>     FilterList listBasedOnFilters = new FilterList();
>     // Filter based on created time range.
>     Long createdTimeBegin = getFilters().getCreatedTimeBegin();
>     Long createdTimeEnd = getFilters().getCreatedTimeEnd();
>     if (createdTimeBegin != 0 || createdTimeEnd != Long.MAX_VALUE) {
>   listBasedOnFilters.addFilter(TimelineFilterUtils
>   .createSingleColValueFiltersByRange(FlowRunColumn.MIN_START_TIME,
>   createdTimeBegin, createdTimeEnd));
>     }
>     // Filter based on metric filters.
>     TimelineFilterList metricFilters = getFilters().getMetricFilters();
>     if (metricFilters != null && !metricFilters.getFilterList().isEmpty()) {
>   listBasedOnFilters.addFilter(TimelineFilterUtils.createHBaseFilterList(
>   FlowRunColumnPrefix.METRIC, metricFilters));
>     }
>     return listBasedOnFilters;
>   }{code}
>  
> createdTimeEnd is used as an upper bound for MIN_START_TIME.  We should 
> create one filter based on createdTimeBegin and another based on 
> createdTimeEnd.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7852) FlowRunReader constructs min_start_time filter for both createdtimestart and createdtimeend.

2018-01-29 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7852:


 Summary: FlowRunReader constructs min_start_time filter for both 
createdtimestart and createdtimeend.
 Key: YARN-7852
 URL: https://issues.apache.org/jira/browse/YARN-7852
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelinereader
Affects Versions: 3.0.0
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7837) TestRMWebServiceAppsNodelabel.testAppsRunning is failing

2018-01-28 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7837:


 Summary: TestRMWebServiceAppsNodelabel.testAppsRunning is failing
 Key: YARN-7837
 URL: https://issues.apache.org/jira/browse/YARN-7837
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Haibo Chen


org.junit.ComparisonFailure: partition amused 
Expected :\{"memory":1024,"vCores":1}
Actual   
:\{"memory":1024,"vCores":1,"resourceInformations":{"resourceInformation":[{"maximumAllocation":9223372036854775807,"minimumAllocation":0,"name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":1024},\{"maximumAllocation":9223372036854775807,"minimumAllocation":0,"name":"vcores","resourceType":"COUNTABLE","units":"","value":1}]}}
 



{code:java}


    at org.junit.Assert.assertEquals(Assert.java:115)
    at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel.verifyResource(TestRMWebServiceAppsNodelabel.java:218)
    at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel.testAppsRunning(TestRMWebServiceAppsNodelabel.java:201)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
    at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
    at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

{code:java}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7748) TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted failed

2018-01-14 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7748:


 Summary: 
TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted 
failed
 Key: YARN-7748
 URL: https://issues.apache.org/jira/browse/YARN-7748
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.0.0
Reporter: Haibo Chen


TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted
Failing for the past 1 build (Since Failed#19244 )
Took 0.4 sec.

*Error Message*
expected null, but 
was:

*Stacktrace*
{code}
java.lang.AssertionError: expected null, but 
was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotNull(Assert.java:664)
at org.junit.Assert.assertNull(Assert.java:646)
at org.junit.Assert.assertNull(Assert.java:656)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing.testIncreaseContainerUnreservedWhenApplicationCompleted(TestContainerResizing.java:826)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:369)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:275)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:239)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:160)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:373)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:334)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:119)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:407)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7015) Handle Container ExecType update (Promotion/Demotion) in cgroups resource handlers

2018-01-08 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-7015.
--
Resolution: Duplicate

> Handle Container ExecType update (Promotion/Demotion) in cgroups resource 
> handlers
> --
>
> Key: YARN-7015
> URL: https://issues.apache.org/jira/browse/YARN-7015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Miklos Szegedi
>
> YARN-5085 adds support for change of container execution type 
> (Promotion/Demotion).
> Modifications to the ContainerManagementProtocol, ContainerManager and 
> ContainerScheduler to handle this change are now in trunk. Opening this JIRA 
> to track changes (if any) required in the cgroups resourcehandlers to 
> accommodate this in the context of YARN-1011.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7716) metricsTimeStart and metricsTimeEnd should be all lower case in the doc

2018-01-08 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7716:


 Summary: metricsTimeStart and metricsTimeEnd should be all lower 
case in the doc
 Key: YARN-7716
 URL: https://issues.apache.org/jira/browse/YARN-7716
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelinereader
Affects Versions: 3.0.0
Reporter: Haibo Chen
Assignee: Haibo Chen


The TimelineV2 REST API is case sensitive. The doc refers to `metricstimestart` 
and `metricstimeend` as metricsTimeStart and metricsTimeEnd. When users follow 
the API doc, it appears that the two parameters do not work properly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7714) YARN_TIMELINESERVER_OPTS is not valid env variable for TimelineReaderServer

2018-01-08 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7714:


 Summary: YARN_TIMELINESERVER_OPTS is not valid env variable for 
TimelineReaderServer
 Key: YARN-7714
 URL: https://issues.apache.org/jira/browse/YARN-7714
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelinereader
Affects Versions: 3.0.0
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6795) Add per-node max allocation threshold with respect to its capacity

2018-01-02 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-6795.
--
Resolution: Duplicate

> Add per-node max allocation threshold with respect to its capacity
> --
>
> Key: YARN-6795
> URL: https://issues.apache.org/jira/browse/YARN-6795
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7602) NM should reference the singleton JvmMetrics instance

2017-12-03 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7602:


 Summary: NM should reference the singleton JvmMetrics instance
 Key: YARN-7602
 URL: https://issues.apache.org/jira/browse/YARN-7602
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0-beta1
Reporter: Haibo Chen
Assignee: Haibo Chen


NM does not reference the singleton JvmMetrics instance in its 
NodeManagerMetrics. This will easily cause NM to crash if any of the node 
manager components tries to register JvmMetrics. An example of this is 
TimelineCollectorManager that hosts a HBaseClient that registers JvmMetrics 
again. See HBASE-19409 for details.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7581) ATSv2 does not construct HBase filters correctly in HBase 2.0

2017-11-29 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7581:


 Summary: ATSv2 does not construct HBase filters correctly in HBase 
2.0
 Key: YARN-7581
 URL: https://issues.apache.org/jira/browse/YARN-7581
 Project: Hadoop YARN
  Issue Type: Bug
  Components: ATSv2
Affects Versions: 3.0.0-beta1
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7548) TestCapacityOverTimePolicy.testAllocation is flaky

2017-11-21 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7548:


 Summary: TestCapacityOverTimePolicy.testAllocation is flaky
 Key: YARN-7548
 URL: https://issues.apache.org/jira/browse/YARN-7548
 Project: Hadoop YARN
  Issue Type: Bug
  Components: reservation system
Affects Versions: 3.0.0-beta1
Reporter: Haibo Chen


It failed in both YARN-7337 and YARN-6921 jenkins jobs.

org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation[Duration
 90,000,000, height 0.25, numSubmission 1, periodic 8640)]

*Stacktrace*

junit.framework.AssertionFailedError: null
at junit.framework.Assert.fail(Assert.java:55)
at junit.framework.Assert.fail(Assert.java:64)
at junit.framework.TestCase.fail(TestCase.java:235)
at 
org.apache.hadoop.yarn.server.resourcemanager.reservation.BaseSharingPolicyTest.runTest(BaseSharingPolicyTest.java:146)
at 
org.apache.hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy.testAllocation(TestCapacityOverTimePolicy.java:136)


*Standard Output*

2017-11-20 23:57:03,759 INFO  [main] recovery.RMStateStore 
(RMStateStore.java:transition(538)) - Storing reservation 
allocation.reservation_-9026698577416205920_6337917439559340517
2017-11-20 23:57:03,759 INFO  [main] recovery.RMStateStore 
(MemoryRMStateStore.java:storeReservationState(247)) - Storing 
reservationallocation for reservation_-9026698577416205920_6337917439559340517 
for plan dedicated
2017-11-20 23:57:03,760 INFO  [main] reservation.InMemoryPlan 
(InMemoryPlan.java:addReservation(373)) - Successfully added reservation: 
reservation_-9026698577416205920_6337917439559340517 to plan.
In-memory Plan: Parent Queue: dedicatedTotal Capacity: Step: 1000reservation_-9026698577416205920_6337917439559340517 
user:u1 startTime: 0 endTime: 8640 Periodiciy: 8640 alloc:
[Period: 8640
0: 
 3423748: 
 86223748: 
 8640: 
 9223372036854775807: null
 ] 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7531) ResourceRequest.equal does not check ExecutionTypeRequest.enforceExecutionType()

2017-11-17 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7531:


 Summary: ResourceRequest.equal does not check 
ExecutionTypeRequest.enforceExecutionType()
 Key: YARN-7531
 URL: https://issues.apache.org/jira/browse/YARN-7531
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
Reporter: Haibo Chen
Assignee: Haibo Chen
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7514) TestAggregatedLogDeletionService.testRefreshLogRetentionSettings is flaky

2017-11-16 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7514:


 Summary: 
TestAggregatedLogDeletionService.testRefreshLogRetentionSettings is flaky
 Key: YARN-7514
 URL: https://issues.apache.org/jira/browse/YARN-7514
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 3.0.0-beta1
Reporter: Haibo Chen


TestAggregatedLogDeletionService.testRefreshLogRetentionSettings fails 
occasionally with 

Error Message

Argument(s) are different! Wanted:
fileSystem.delete(
mockfs://foo/tmp/logs/me/logs/application_1510201418065_0002,
true
);
-> at 
org.apache.hadoop.yarn.logaggregation.TestAggregatedLogDeletionService.testRefreshLogRetentionSettings(TestAggregatedLogDeletionService.java:300)
Actual invocation has different arguments:
fileSystem.delete(
mockfs://foo/tmp/logs/me/logs/application_1510201418024_0001,
true
);
-> at org.apache.hadoop.fs.FilterFileSystem.delete(FilterFileSystem.java:252)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7491) Make sure AM is not scheduled on an opportunistic container

2017-11-13 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7491:


 Summary: Make sure AM is not scheduled on an opportunistic 
container
 Key: YARN-7491
 URL: https://issues.apache.org/jira/browse/YARN-7491
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7460) Exclude findbugs warnings on SchedulerNode.numGuaranteedContainers and numOpportunisticContainers

2017-11-07 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7460:


 Summary: Exclude findbugs warnings on 
SchedulerNode.numGuaranteedContainers and numOpportunisticContainers
 Key: YARN-7460
 URL: https://issues.apache.org/jira/browse/YARN-7460
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Haibo Chen
Assignee: Haibo Chen


The findbug warnings as in 
https://builds.apache.org/job/PreCommit-YARN-Build/18390/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html
 are bogus. We should exclude it in 
hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7456) TestAMRMClient.testAMRMClientWithContainerResourceChange[0] failed

2017-11-07 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7456:


 Summary: 
TestAMRMClient.testAMRMClientWithContainerResourceChange[0] failed
 Key: YARN-7456
 URL: https://issues.apache.org/jira/browse/YARN-7456
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0-beta1
Reporter: Haibo Chen


*Error Message*
expected:<1> but was:<0>

*Stacktrace*
java.lang.AssertionError: expected:<1> but was:<0>
at 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1150)
at 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:1025)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7421) Preserve execution type for containers to be increased by AM post YARN-1015

2017-10-31 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-7421.
--
Resolution: Not A Problem

Based on, 
https://issues.apache.org/jira/browse/YARN-7178?focusedCommentId=16227306=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16227306
This is not an issue. Closing this.

> Preserve execution type for containers to be increased by AM post YARN-1015
> ---
>
> Key: YARN-7421
> URL: https://issues.apache.org/jira/browse/YARN-7421
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> From offline discussion with [~asuresh], if an AM tries to promote or 
> increase a container whose enforceExecutionType is set to false, the 
> scheduler may, in the presence of oversubscription,  allocate an 
> OPPORTUNISTIC container. This should not happen given the contract we have in 
> container update API. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7421) Preserve execution type for containers to be promoted by AM post YARN-1015

2017-10-31 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7421:


 Summary: Preserve execution type for containers to be promoted by 
AM post YARN-1015
 Key: YARN-7421
 URL: https://issues.apache.org/jira/browse/YARN-7421
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: YARN-1011
Reporter: Haibo Chen
Assignee: Haibo Chen


>From offline discussion with [~asuresh], if an AM tries to promote or increase 
>a container whose enforceExecutionType is set to false, the scheduler may, in 
>the presence of oversubscription,  allocate an OPPORTUNISTIC container. This 
>should not happen given the contract we have in container update API. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7412) test_docker_util.test_check_mount_permitted() is failing

2017-10-27 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7412:


 Summary: test_docker_util.test_check_mount_permitted() is failing
 Key: YARN-7412
 URL: https://issues.apache.org/jira/browse/YARN-7412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0-alpha4
Reporter: Haibo Chen
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7388) TestAMRestart should be scheduler agnostic

2017-10-24 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7388:


 Summary: TestAMRestart should be scheduler agnostic
 Key: YARN-7388
 URL: https://issues.apache.org/jira/browse/YARN-7388
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha4
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7373) The atomicity of container update in RM is not clear

2017-10-23 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-7373.
--
Resolution: Information Provided

> The atomicity of container update in RM is not clear
> 
>
> Key: YARN-7373
> URL: https://issues.apache.org/jira/browse/YARN-7373
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> While reviewing YARN-4511, Miklos noticed that  
> {code:java}
> 342   // notify schedulerNode of the update to correct resource accounting
> 343   node.containerUpdated(existingRMContainer, existingContainer);
> 344   
> 345   
> ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
> 346   // notify SchedulerNode of the update to correct resource accounting
> 347   node.containerUpdated(tempRMContainer, tempContainer);
> 348   
> {code}
> bq. I think that it would be nicer to lock around these two calls to become 
> atomic.
> Container update, and thus container swap as part of that, is atomic 
> according to [~asuresh].
> It'd be nice to discuss this in more details to see if we want to be more 
> conservative.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7373) The atomicity of container update in RM is not clear

2017-10-20 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7373:


 Summary: The atomicity of container update in RM is not clear
 Key: YARN-7373
 URL: https://issues.apache.org/jira/browse/YARN-7373
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Haibo Chen
Assignee: Haibo Chen


While reviewing YARN-4511, Miklos pointed out that  
{code:java}
342 // notify schedulerNode of the update to correct resource accounting
343 node.containerUpdated(existingRMContainer, existingContainer);
344 
345 
((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
346 // notify SchedulerNode of the update to correct resource accounting
347 node.containerUpdated(tempRMContainer, tempContainer);
348 
{code}
bq. I think that it would be nicer to lock around these two calls to become 
atomic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7362) Set assumption of capacity scheduler for TestClientRMService.testUpdateApplicationPriorityRequest() and testUpdatePriorityAndKillAppWithZeroClusterResource()

2017-10-18 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-7362.
--
Resolution: Duplicate

> Set assumption of capacity scheduler for 
> TestClientRMService.testUpdateApplicationPriorityRequest() and 
> testUpdatePriorityAndKillAppWithZeroClusterResource()
> -
>
> Key: YARN-7362
> URL: https://issues.apache.org/jira/browse/YARN-7362
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7362) Explicitly set assumption of capacity scheduler for TestClientRMService. testUpdateApplicationPriorityRequest() and testUpdatePriorityAndKillAppWithZeroClusterResource()

2017-10-18 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7362:


 Summary: Explicitly set assumption of capacity scheduler for 
TestClientRMService. testUpdateApplicationPriorityRequest() and  
testUpdatePriorityAndKillAppWithZeroClusterResource()
 Key: YARN-7362
 URL: https://issues.apache.org/jira/browse/YARN-7362
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Haibo Chen
Assignee: Haibo Chen
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7360) TestRM.testNMTokenSentForNormalContainer() fails with Fair Scheduler

2017-10-18 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7360:


 Summary: TestRM.testNMTokenSentForNormalContainer() fails with 
Fair Scheduler
 Key: YARN-7360
 URL: https://issues.apache.org/jira/browse/YARN-7360
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0-alpha3
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7359) TestAppManager.testQueueSubmitWithNoPermission()

2017-10-18 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7359:


 Summary: TestAppManager.testQueueSubmitWithNoPermission()
 Key: YARN-7359
 URL: https://issues.apache.org/jira/browse/YARN-7359
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7358) TestZKConfigurationStore and TestZKConfigurationStore should explicitly set capacity scheduler

2017-10-18 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7358:


 Summary: TestZKConfigurationStore and TestZKConfigurationStore 
should explicitly set capacity scheduler
 Key: YARN-7358
 URL: https://issues.apache.org/jira/browse/YARN-7358
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7355) TestDistributedShell should be scheduler agnostic

2017-10-18 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7355:


 Summary: TestDistributedShell should be scheduler agnostic 
 Key: YARN-7355
 URL: https://issues.apache.org/jira/browse/YARN-7355
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha3
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7337) Expose per-node over-allocation info in Resource Manager

2017-10-16 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7337:


 Summary: Expose per-node over-allocation info in Resource Manager
 Key: YARN-7337
 URL: https://issues.apache.org/jira/browse/YARN-7337
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager
Affects Versions: 3.0.0-alpha3
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7334) Add documentation of oversubscription

2017-10-16 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7334:


 Summary: Add documentation of oversubscription
 Key: YARN-7334
 URL: https://issues.apache.org/jira/browse/YARN-7334
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: YARN-1011
Reporter: Haibo Chen
Assignee: Haibo Chen
Priority: Blocker


Need to add user documentation of YARN oversubscription,
plus API/proto changes before release as pointed out by Wangda.  

Will update this list as we find more things as necessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7300) DiskValidator is not used in LocalDirAllocator

2017-10-09 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-7300:


 Summary: DiskValidator is not used in LocalDirAllocator
 Key: YARN-7300
 URL: https://issues.apache.org/jira/browse/YARN-7300
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Haibo Chen



HADOOP-13254 introduced a pluggable disk validator to replace 
DiskChecker.checkDir(). However, LocalDirAllocator still references the old 
DiskChecker.checkDir(). It'd be nice to
use the plugin uniformly so that user configurations take effect in all places.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7032) [ATSv2] NPE while starting hbase co-processor

2017-08-17 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-7032.
--
Resolution: Won't Fix

> [ATSv2] NPE while starting hbase co-processor
> -
>
> Key: YARN-7032
> URL: https://issues.apache.org/jira/browse/YARN-7032
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>
> It is seen randomly that hbase co-processor fails to start with NPE. But 
> again starting RegionServer, able to succeed in starting RS. 
> {noformat}
> 2017-08-17 05:53:13,535 ERROR 
> [RpcServer.FifoWFPBQ.priority.handler=18,queue=0,port=16020] 
> coprocessor.CoprocessorHost: The coprocessor 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunCoprocessor 
> threw java.lang.NullPointerException
> java.lang.NullPointerException
> at org.apache.hadoop.hbase.Tag.fromList(Tag.java:187)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunCoprocessor.prePut(FlowRunCoprocessor.java:102)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$30.call(RegionCoprocessorHost.java:885)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6796) Add unit test for NM to launch OPPORTUNISTIC container for overallocation

2017-08-16 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-6796.
--
Resolution: Duplicate

> Add unit test for NM to launch OPPORTUNISTIC container for overallocation
> -
>
> Key: YARN-6796
> URL: https://issues.apache.org/jira/browse/YARN-6796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6843) Add unit test for NM to preempt OPPORTUNISTIC containers under high utilization

2017-08-16 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-6843.
--
Resolution: Duplicate

Unit tests will be added as part of YARN-6672

> Add unit test for NM to preempt OPPORTUNISTIC containers under high 
> utilization
> ---
>
> Key: YARN-6843
> URL: https://issues.apache.org/jira/browse/YARN-6843
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6931) Make the aggregation interval in AppLevelTimelineCollector configurable

2017-08-02 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6931:


 Summary: Make the aggregation interval in 
AppLevelTimelineCollector configurable
 Key: YARN-6931
 URL: https://issues.apache.org/jira/browse/YARN-6931
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Affects Versions: 3.0.0-alpha3
Reporter: Haibo Chen
Priority: Minor


We do application-level metrics aggregation in AppLevelTimelineCollector, but 
the interval is hardcoded.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6921) Allow resource request to opt out of oversubscription

2017-08-01 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6921:


 Summary: Allow resource request to opt out of oversubscription
 Key: YARN-6921
 URL: https://issues.apache.org/jira/browse/YARN-6921
 Project: Hadoop YARN
  Issue Type: Task
  Components: scheduler
Affects Versions: 3.0.0-alpha3
Reporter: Haibo Chen
Assignee: Haibo Chen


Guaranteed container requests, enforce tag true or not, are by default eligible 
for oversubscription, and thus can get OPPORTUNISTIC container allocations. We 
should allow them to opt out if their enforce tag is set to true.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6843) Add unit test for NM to preemption OPPORTUNISTIC containers under high utilization

2017-07-19 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6843:


 Summary: Add unit test for NM to preemption OPPORTUNISTIC 
containers under high utilization
 Key: YARN-6843
 URL: https://issues.apache.org/jira/browse/YARN-6843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0-alpha4
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6812) Consolidate ContainerScheduler maxOpprQueueLength with ContainerQueuingLimit

2017-07-12 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6812:


 Summary: Consolidate ContainerScheduler maxOpprQueueLength with 
ContainerQueuingLimit 
 Key: YARN-6812
 URL: https://issues.apache.org/jira/browse/YARN-6812
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6806) Scheduler-agnostic RM changes support oversubscription

2017-07-11 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-6806.
--
Resolution: Duplicate

> Scheduler-agnostic RM changes support oversubscription
> --
>
> Key: YARN-6806
> URL: https://issues.apache.org/jira/browse/YARN-6806
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6806) Scheduler-agnostic RM changes support oversubscription

2017-07-11 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6806:


 Summary: Scheduler-agnostic RM changes support oversubscription
 Key: YARN-6806
 URL: https://issues.apache.org/jira/browse/YARN-6806
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6800) Add opportunity to start containers while periodically checking for preemption

2017-07-10 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6800:


 Summary: Add opportunity to start containers while periodically 
checking for preemption
 Key: YARN-6800
 URL: https://issues.apache.org/jira/browse/YARN-6800
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6794) promote OPPORTUNISITC containers locally at the node where they're running

2017-07-10 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6794:


 Summary: promote OPPORTUNISITC containers locally at the node 
where they're running
 Key: YARN-6794
 URL: https://issues.apache.org/jira/browse/YARN-6794
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, scheduler
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6796) Add unit test for NM to launch OPPORTUNISTIC container for oversubscription

2017-07-10 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6796:


 Summary: Add unit test for NM to launch OPPORTUNISTIC container 
for oversubscription
 Key: YARN-6796
 URL: https://issues.apache.org/jira/browse/YARN-6796
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6795) Add per-node max allocation threshold with respect to its capacity

2017-07-10 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6795:


 Summary: Add per-node max allocation threshold with respect to its 
capacity
 Key: YARN-6795
 URL: https://issues.apache.org/jira/browse/YARN-6795
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6767) Timeline client won't be able to write when TimelineCollector is not up yet, or NM is down

2017-07-06 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6767:


 Summary: Timeline client won't be able to write when 
TimelineCollector is not up yet, or NM is down
 Key: YARN-6767
 URL: https://issues.apache.org/jira/browse/YARN-6767
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineclient
Affects Versions: 3.0.0-alpha4
Reporter: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6750) Add a configuration to cap how much a NM can be overallocated

2017-06-29 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6750:


 Summary: Add a configuration to cap how much a NM can be 
overallocated
 Key: YARN-6750
 URL: https://issues.apache.org/jira/browse/YARN-6750
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6671) Add container type awareness in ResourceHandlers.

2017-06-23 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-6671.
--
Resolution: Not A Problem

> Add container type awareness in ResourceHandlers.
> -
>
> Key: YARN-6671
> URL: https://issues.apache.org/jira/browse/YARN-6671
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
>
> When using LCE, different cgroup settings for opportunistic and guaranteed 
> containers can be used to ensure isolation between them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6739) Crash NM at start time if oversubscription is on but LinuxContainerExcutor or cgroup is off

2017-06-23 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6739:


 Summary: Crash NM at start time if oversubscription is on but 
LinuxContainerExcutor or cgroup is off
 Key: YARN-6739
 URL: https://issues.apache.org/jira/browse/YARN-6739
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0-alpha3
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6723) NM overallocation based on over-time rather than snapshot utilization

2017-06-19 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6723:


 Summary: NM overallocation based on over-time rather than snapshot 
utilization
 Key: YARN-6723
 URL: https://issues.apache.org/jira/browse/YARN-6723
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.0.0-alpha3
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6706) Refactor ContainerScheduler to make oversubscription change easier

2017-06-09 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6706:


 Summary: Refactor ContainerScheduler to make oversubscription 
change easier
 Key: YARN-6706
 URL: https://issues.apache.org/jira/browse/YARN-6706
 Project: Hadoop YARN
  Issue Type: Task
Affects Versions: 3.0.0-alpha3
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6705) Add separate NM preemption thresholds for cpu and memory

2017-06-09 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-6705:


 Summary: Add separate NM preemption thresholds for cpu and memory
 Key: YARN-6705
 URL: https://issues.apache.org/jira/browse/YARN-6705
 Project: Hadoop YARN
  Issue Type: Task
  Components: nodemanager
Affects Versions: 3.0.0-alpha3
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



  1   2   >