[jira] [Created] (YARN-8660) Fair/FIFOSchedulers: applications could get starved because computation of #activeUsers considers pending apps

2018-08-13 Thread Manikandan R (JIRA)
Manikandan R created YARN-8660:
--

 Summary: Fair/FIFOSchedulers: applications could get starved 
because computation of #activeUsers considers pending apps
 Key: YARN-8660
 URL: https://issues.apache.org/jira/browse/YARN-8660
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Manikandan R


Based on YARN-4606 discussion, FS & FIFO schedulers also has app starvation 
issue. Please refer YARN-4606 for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8659) AHSWebServices returns only RUNNING apps when filtered with queue

2018-08-13 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-8659:
---

 Summary: AHSWebServices returns only RUNNING apps when filtered 
with queue
 Key: YARN-8659
 URL: https://issues.apache.org/jira/browse/YARN-8659
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.3
Reporter: Prabhu Joseph
 Attachments: Screen Shot 2018-08-13 at 8.01.29 PM.png, Screen Shot 
2018-08-13 at 8.01.52 PM.png

AHSWebServices returns only RUNNING apps when filtered with queue and returns 
empty apps
when filtered with both FINISHED states and queue.

http://pjoseph-script-llap3.openstacklocal:8088/ws/v1/cluster/apps?queue=default

http://pjoseph-script-llap3.openstacklocal:8088/ws/v1/cluster/apps?states=FINISHED=default



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8650) Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and Invalid event: CONTAINER_LAUNCHED at DONE

2018-08-13 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie resolved YARN-8650.
-
Resolution: Duplicate

> Invalid event: CONTAINER_KILLED_ON_REQUEST at DONE and  Invalid event: 
> CONTAINER_LAUNCHED at DONE
> -
>
> Key: YARN-8650
> URL: https://issues.apache.org/jira/browse/YARN-8650
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: lujie
>Priority: Major
> Attachments: hadoop-hires-nodemanager-hadoop11.log, 
> hadoop-hires-nodemanager-hadoop15.log
>
>
> We have tested the hadoop while  nodemanager is shutting down and encounter 
> two InvalidStateTransitionException:
> {code:java}
> 2018-08-04 14:29:33,025 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Can't handle this event at current state: Current: [DONE], eventType: 
> [CONTAINER_KILLED_ON_REQUEST], container: 
> [container_1533364185282_0001_01_01]
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_KILLED_ON_REQUEST at DONE
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> CONTAINER_LAUNCHED at DONE
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:2084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:103)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1483)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1476)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> We have analysis these two bugs, and find that shutdown will send kill event 
> and hence cause these two exception. We have test the our cluster for many 
> time and can determinately  reproduce it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8658) Metrics for AMRMClientRelayer inside FederationInterceptor

2018-08-13 Thread Botong Huang (JIRA)
Botong Huang created YARN-8658:
--

 Summary: Metrics for AMRMClientRelayer inside FederationInterceptor
 Key: YARN-8658
 URL: https://issues.apache.org/jira/browse/YARN-8658
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Botong Huang
Assignee: Young Chen


AMRMClientRelayer (YARN-7900) is introduced for stateful FederationInterceptor 
(YARN-7899), to keep track of all pending requests sent to every subcluster 
YarnRM. We need to add metrics for AMRMClientRelayer to show the state of 
things in FederationInterceptor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8657) User limit calculation should be read-lock-protected within LeafQueue

2018-08-13 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-8657:


 Summary: User limit calculation should be read-lock-protected 
within LeafQueue
 Key: YARN-8657
 URL: https://issues.apache.org/jira/browse/YARN-8657
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Sumana Sathish
Assignee: Wangda Tan


When async scheduling is enabled, user limit calculation could be wrong: 

It is possible that scheduler calculated a user_limit, but inside 
{{canAssignToUser}} it becomes staled. 

We need to protect user limit calculation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8656) container-executor should not write cgroup tasks files for docker containers

2018-08-13 Thread Jim Brennan (JIRA)
Jim Brennan created YARN-8656:
-

 Summary: container-executor should not write cgroup tasks files 
for docker containers
 Key: YARN-8656
 URL: https://issues.apache.org/jira/browse/YARN-8656
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jim Brennan


If cgroups are enabled, we pass the {{--cgroup-parent}} option to {{docker 
run}} to ensure that all processes for the container are placed into a cgroup 
under (for example) {{/sys/fs/cgroup/cpu/cgroups.hierarchy/container_id}}. 
Docker creates a cgroup there with the docker container id as the name and all 
of the processes in the container go into that cgroup.

container-executor has code in {{launch_docker_container_as_user()}} that then 
cherry-picks the PID of the docker container (usually the launch shell) and 
writes that into the 
{{/sys/fs/cgroup/cpu/cgroups.hierarchy/container_id/tasks}} file, effectively 
moving it from 
{{/sys/fs/cgroup/cpu/cgroups.hierarchy/container_id/docker_container_id}} to 
{{/sys/fs/cgroup/cpu/cgroups.hierarchy/container_id}}.  So you end up with one 
process out of the container in the {{container_id}} cgroup, and the rest in 
the {{container_id/docker_container_id}} cgroup.

Since we are passing the {{--cgroup-parent}} to docker, there is no need to 
manually write the container pid to the tasks file - we can just remove the 
code that does this in the docker case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-08-13 Thread Vinod Kumar Vavilapalli
Yongjun,

Looks like you didn't add the links to 3.0.3 binary release on the 
http://hadoop.apache.org/releases.html page.

I just did it, FYI: 
https://svn.apache.org/viewvc?view=revision=1837967 


Thanks
+Vinod


> On May 31, 2018, at 10:48 PM, Yongjun Zhang  wrote:
> 
> Greetings all,
> 
> I've created the first release candidate (RC0) for Apache Hadoop
> 3.0.3. This is our next maintenance release to follow up 3.0.2. It includes
> about 249
> important fixes and improvements, among which there are 8 blockers. See
> https://issues.apache.org/jira/issues/?filter=12343997
> 
> The RC artifacts are available at:
> https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/
> 
> The maven artifacts are available via
> https://repository.apache.org/content/repositories/orgapachehadoop-1126
> 
> Please try the release and vote; the vote will run for the usual 5 working
> days, ending on 06/07/2018 PST time. Would really appreciate your
> participation here.
> 
> I bumped into quite some issues along the way, many thanks to quite a few
> people who helped, especially Sammi Chen, Andrew Wang, Junping Du, Eddy Xu.
> 
> Thanks,
> 
> --Yongjun



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2018-08-13 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/867/

[Aug 12, 2018 10:05:52 AM] (umamahesh) HDFS-10794. [SPS]: Provide storage 
policy satisfy worker at DN for
[Aug 12, 2018 10:05:53 AM] (umamahesh) HDFS-10800: [SPS]: Daemon thread in 
Namenode to find blocks placed in
[Aug 12, 2018 10:05:53 AM] (umamahesh) HDFS-10801. [SPS]: Protocol buffer 
changes for sending storage movement
[Aug 12, 2018 10:05:53 AM] (umamahesh) HDFS-10884: [SPS]: Add block movement 
tracker to track the completion of
[Aug 12, 2018 10:05:53 AM] (umamahesh) HDFS-10954. [SPS]: Provide mechanism to 
send blocks movement result back
[Aug 12, 2018 10:05:53 AM] (umamahesh) HDFS-11029. [SPS]:Provide retry 
mechanism for the blocks which were
[Aug 12, 2018 10:05:54 AM] (umamahesh) HDFS-11068: [SPS]: Provide unique 
trackID to track the block movement
[Aug 12, 2018 10:05:54 AM] (umamahesh) HDFS-10802. [SPS]: Add 
satisfyStoragePolicy API in HdfsAdmin.
[Aug 12, 2018 10:05:54 AM] (umamahesh) HDFS-11151. [SPS]: 
StoragePolicySatisfier should gracefully handle when
[Aug 12, 2018 10:05:55 AM] (umamahesh) HDFS-10885. [SPS]: Mover tool should not 
be allowed to run when Storage
[Aug 12, 2018 10:05:55 AM] (umamahesh) HDFS-11123. [SPS] Make storage policy 
satisfier daemon work on/off
[Aug 12, 2018 10:05:55 AM] (umamahesh) HDFS-11032: [SPS]: Handling of block 
movement failure at the coordinator
[Aug 12, 2018 10:05:55 AM] (umamahesh) HDFS-11248: [SPS]: Handle partial block 
location movements. Contributed
[Aug 12, 2018 10:05:56 AM] (umamahesh) HDFS-11193 : [SPS]: Erasure coded files 
should be considered for
[Aug 12, 2018 10:05:56 AM] (umamahesh) HDFS-11289. [SPS]: Make SPS movement 
monitor timeouts configurable.
[Aug 12, 2018 10:05:56 AM] (umamahesh) HDFS-11293: [SPS]: Local DN should be 
given preference as source node,
[Aug 12, 2018 10:05:57 AM] (umamahesh) HDFS-11150: [SPS]: Provide persistence 
when satisfying storage policy.
[Aug 12, 2018 10:05:57 AM] (umamahesh) HDFS-11186. [SPS]: Daemon thread of SPS 
should start only in Active NN.
[Aug 12, 2018 10:05:57 AM] (umamahesh) HDFS-11309. [SPS]: 
chooseTargetTypeInSameNode should pass accurate block
[Aug 12, 2018 10:05:57 AM] (umamahesh) HDFS-11243. [SPS]: Add a protocol 
command from NN to DN for dropping the
[Aug 12, 2018 10:05:57 AM] (umamahesh) HDFS-11239: [SPS]: Check Mover file ID 
lease also to determine whether
[Aug 12, 2018 10:05:58 AM] (umamahesh) HDFS-11336: [SPS]: Remove xAttrs when 
movements done or SPS disabled.
[Aug 12, 2018 10:05:58 AM] (umamahesh) HDFS-11338: [SPS]: Fix timeout issue in 
unit tests caused by longger NN
[Aug 12, 2018 10:05:58 AM] (umamahesh) HDFS-11334: [SPS]: NN switch and 
rescheduling movements can lead to have
[Aug 12, 2018 10:05:58 AM] (umamahesh) HDFS-11572. [SPS]: SPS should clean 
Xattrs when no blocks required to
[Aug 12, 2018 10:05:59 AM] (umamahesh) HDFS-11695: [SPS]: Namenode failed to 
start while loading SPS xAttrs
[Aug 12, 2018 10:05:59 AM] (umamahesh) HDFS-11883: [SPS] : Handle NPE in 
BlockStorageMovementTracker when
[Aug 12, 2018 10:05:59 AM] (umamahesh) HDFS-11762. [SPS]: Empty files should be 
ignored in
[Aug 12, 2018 10:05:59 AM] (umamahesh) HDFS-11726. [SPS]: 
StoragePolicySatisfier should not select same storage
[Aug 12, 2018 10:05:59 AM] (umamahesh) HDFS-11966. [SPS] Correct the log in
[Aug 12, 2018 10:05:59 AM] (umamahesh) HDFS-11670: [SPS]: Add CLI command for 
satisfy storage policy
[Aug 12, 2018 10:06:00 AM] (umamahesh) HDFS-11965: [SPS]: Should give chance to 
satisfy the low redundant
[Aug 12, 2018 10:06:00 AM] (umamahesh) HDFS-11264: [SPS]: Double checks to 
ensure that SPS/Mover are not
[Aug 12, 2018 10:06:00 AM] (umamahesh) HDFS-11874. [SPS]: Document the SPS 
feature. Contributed by Uma
[Aug 12, 2018 10:06:00 AM] (umamahesh) HDFS-12146. [SPS]: Fix
[Aug 12, 2018 10:06:00 AM] (umamahesh) HDFS-12141: [SPS]: Fix checkstyle 
warnings. Contributed by Rakesh R.
[Aug 12, 2018 10:06:00 AM] (umamahesh) HDFS-12152: [SPS]: Re-arrange 
StoragePolicySatisfyWorker stopping
[Aug 12, 2018 10:06:01 AM] (umamahesh) HDFS-12214: [SPS]: Fix review comments 
of StoragePolicySatisfier
[Aug 12, 2018 10:06:01 AM] (umamahesh) HDFS-12225: [SPS]: Optimize extended 
attributes for tracking SPS
[Aug 12, 2018 10:06:01 AM] (umamahesh) HDFS-12291: [SPS]: Provide a mechanism 
to recursively iterate and
[Aug 12, 2018 10:06:01 AM] (umamahesh) HDFS-12570: [SPS]: Refactor Co-ordinator 
datanode logic to track the
[Aug 12, 2018 10:06:01 AM] (umamahesh) HDFS-12556: [SPS] : Block movement 
analysis should be done in read lock.
[Aug 12, 2018 10:06:02 AM] (umamahesh) HDFS-12310: [SPS]: Provide an option to 
track the status of in progress
[Aug 12, 2018 10:06:02 AM] (umamahesh) HDFS-12790: [SPS]: Rebasing HDFS-10285 
branch after HDFS-10467,
[Aug 12, 2018 10:06:02 AM] (umamahesh) HDFS-12106: [SPS]: Improve storage 
policy satisfier configurations.
[Aug 12, 2018 10:06:02 AM] (umamahesh) HDFS-12955: [SPS]: Move SPS classes to a 
separate 

[jira] [Created] (YARN-8655) FSStarvedApps is not thread unsafe

2018-08-13 Thread Zhaohui Xin (JIRA)
Zhaohui Xin created YARN-8655:
-

 Summary: FSStarvedApps is not thread unsafe
 Key: YARN-8655
 URL: https://issues.apache.org/jira/browse/YARN-8655
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 3.0.0
Reporter: Zhaohui Xin


For example, when app1 is fair share starved, it has been added to 
appsToProcess. After that, app1 is taken but appBeingProcessed is not yet 
update to app1. At the moment, app1 is starved by min share, so this app is 
added to appsToProcess again! Because appBeingProcessed is null and 
appsToProcess also have not this one!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8654) the memory metrics is not same for monitor web and yarn web

2018-08-13 Thread Rakesh Shah (JIRA)
Rakesh Shah created YARN-8654:
-

 Summary: the memory metrics is not same for monitor web and yarn 
web
 Key: YARN-8654
 URL: https://issues.apache.org/jira/browse/YARN-8654
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Rakesh Shah






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Windows/x64

2018-08-13 Thread Apache Jenkins Server
For more details, see https://builds.apache.org/job/hadoop-trunk-win/557/

[Aug 13, 2018 7:17:52 AM] (msingh) HDDS-308. SCM should identify a container 
with pending deletes using
[Aug 13, 2018 8:52:55 AM] (sunilg) YARN-8561. [Submarine] Initial 
implementation: Training job submission
[Aug 13, 2018 9:32:56 AM] (drankye) HDFS-13668. FSPermissionChecker may throws 
AIOOE when check inode
[Aug 13, 2018 10:57:45 AM] (ewan.higgs) HADOOP-15645. 
ITestS3GuardToolLocal.testDiffCommand fails if bucket has


ERROR: File 'out/email-report.txt' does not exist

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-8653) Wrong display of resources when cluster resources are less than min resources

2018-08-13 Thread Jinjiang Ling (JIRA)
Jinjiang Ling created YARN-8653:
---

 Summary: Wrong display of resources when cluster resources are 
less than min resources
 Key: YARN-8653
 URL: https://issues.apache.org/jira/browse/YARN-8653
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jinjiang Ling
Assignee: Jinjiang Ling
 Attachments: wrong_resource_in_fairscheduler.JPG

If the cluster resources are less the min resources of Fair Scheduler, a 
display error will happened like this.

 

!wrong_resource_in_fairscheduler.JPG!

In this case, I config my queue with max resource to 48 vcores, 49152 MB and 
min resources to 36 vcores, 36864 MB. But the cluster resources are only 24 
vcores and 24576 MB. Then the max resource are fixed to the cluster resources, 
but the min resources and steady fair share are still the config value.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8652) [UI2] YARN UI2 breaks if there is no userinfo API - backward compatibility

2018-08-13 Thread Akhil PB (JIRA)
Akhil PB created YARN-8652:
--

 Summary: [UI2] YARN UI2 breaks if there is no userinfo API - 
backward compatibility
 Key: YARN-8652
 URL: https://issues.apache.org/jira/browse/YARN-8652
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Akhil PB
Assignee: Akhil PB






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org