[jira] [Commented] (YARN-8913) Add helper scripts to launch MaWo App to run Hadoop unit tests on Hadoop Cluster

2018-12-09 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714313#comment-16714313
 ] 

Allen Wittenauer commented on YARN-8913:


This should absolutely be using the dev-support/docker/Dockerfile and not 
something custom.  As it stands, this Dockerfile is missing some absolutely 
critical parts (eg, everything native) and is downloading components that are 
very different than what the ASF actually uses for its tests. (Why is maven 
coming from Fedora instead of, you know, Apache?)

> Add helper scripts to launch MaWo App to run Hadoop unit tests on Hadoop 
> Cluster
> 
>
> Key: YARN-8913
> URL: https://issues.apache.org/jira/browse/YARN-8913
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Blocker
> Attachments: YARN-8913.001.patch, YARN-8913.002.patch, 
> YARN-8913.003.patch
>
>
> MaWo application can be used to run Hadoop UT faster in a Hadoop cluster.
>  Develop helper scripts to orchestrate end-to-end workflow for running Hadoop 
> UT using MaWo app.
> Pre-requisite:
>  * A Hadoop Cluster with HDFS and YARN installed
>  * Enable Docker on YARN feature
>  
> Helper-scripts
>  * MaWo_Driver
>  ** create a docker image with latest hadoop source code
>  ** create payload to MaWo app (This is input to mawo app where Each MaWo 
> Task = UT execution of each Hadoop Module)
>  ** Upload payload file to HDFS
>  ** Update MaWo-Launch.json to resolve RM_HOST / Docker Image etc dynamically
>  ** Launch MaWo app in Hadoop cluster



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4677:
---
Fix Version/s: 3.0.4

> RMNodeResourceUpdateEvent update from scheduler can lead to race condition
> --
>
> Key: YARN-4677
> URL: https://issues.apache.org/jira/browse/YARN-4677
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Brook Zhou
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 2.9.2, 3.0.4
>
> Attachments: YARN-4677-branch-2.001.patch, 
> YARN-4677-branch-2.002.patch, YARN-4677-branch-2.003.patch, YARN-4677.01.patch
>
>
> When a node is in decommissioning state, there is time window between 
> completedContainer() and RMNodeResourceUpdateEvent get handled in 
> scheduler.nodeUpdate (YARN-3223). 
> So if a scheduling effort happens within this window, the new container could 
> still get allocated on this node. Even worse case is if scheduling effort 
> happen after RMNodeResourceUpdateEvent sent out but before it is propagated 
> to SchedulerNode - then the total resource is lower than used resource and 
> available resource is a negative value. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8382) cgroup file leak in NM

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-8382:
---
Fix Version/s: 3.0.4

> cgroup file leak in NM
> --
>
> Key: YARN-8382
> URL: https://issues.apache.org/jira/browse/YARN-8382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: we write an container with a shutdownHook which has a 
> piece of code like  "while(true) sleep(100)" . when 
> *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms <* 
> *yarn.nodemanager.sleep-delay-before-sigkill.ms , cgourp file leak happens; 
> when* *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms >* 
> ** *yarn.nodemanager.sleep-delay-before-sigkill.ms, cgroup file is deleted 
> successfully***
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8382-branch-2.8.3.001.patch, 
> YARN-8382-branch-2.8.3.002.patch, YARN-8382.001.patch, YARN-8382.002.patch
>
>
> As Jiandan said in YARN-6562, NM may delete  Cgroup container file timeout 
> with logs like below:
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: 
> Unable to delete cgroup at: /cgroup/cpu/hadoop-yarn/container_xxx, tried to 
> delete for 1000ms
>  
> we found one situation is that when we set 
> *yarn.nodemanager.sleep-delay-before-sigkill.ms* bigger than 
> *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms*, the 
> cgroup file leak happens *.* 
>  
> One container process tree looks like follow graph:
> bash(16097)───java(16099)─┬─\{java}(16100) 
>                                                   ├─\{java}(16101) 
> {{                       ├─\{java}(16102)}}
>  
> {{when NM kills a container, NM sends kill -15 -pid to kill container process 
> group. Bash process will exit when it received sigterm, but java process may 
> do some job (shutdownHook etc.), and doesn't exit unit receive sigkill. And 
> when bash process exits, CgroupsLCEResourcesHandler begin to try to delete 
> cgroup files. So when 
> *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* 
> arrived, the java processes may still running and cgourp/tasks still not 
> empty and cause a cgroup file leak.}}
>  
> {{we add a condition that 
> *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* must 
> bigger than *yarn.nodemanager.sleep-delay-before-sigkill.ms* to solve this 
> problem.}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-7190:
---
Affects Version/s: (was: 3.0.x)

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Affects Versions: 2.9.0, 3.0.1, 3.0.2, 3.0.3
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Major
> Fix For: YARN-5355_branch2, 3.1.0, 2.9.1, 3.0.3
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch, YARN-7190.02.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8354) SingleConstraintAppPlacementAllocator's allocate does not decPendingResource

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-8354:
---
Affects Version/s: (was: 3.0.x)

> SingleConstraintAppPlacementAllocator's allocate does not decPendingResource
> 
>
> Key: YARN-8354
> URL: https://issues.apache.org/jira/browse/YARN-8354
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Reporter: LongGang Chen
>Priority: Major
>
> SingleConstraintAppPlacementAllocator.allocate()  does not 
> decPendingResource,only 
> reduce ResourceSizing.numAllocations by one.
> may be we should change decreasePendingNumAllocation() from :
>  
> {code:java}
> private void decreasePendingNumAllocation() {
>   // Deduct pending #allocations by 1
>   ResourceSizing sizing = schedulingRequest.getResourceSizing();
>   sizing.setNumAllocations(sizing.getNumAllocations() - 1);
> }
> {code}
> to:
> {code:java}
> private void decreasePendingNumAllocation() {
>   // Deduct pending #allocations by 1
>   ResourceSizing sizing = schedulingRequest.getResourceSizing();
>   sizing.setNumAllocations(sizing.getNumAllocations() - 1);
>   // Deduct pending resource of app and queue
>   appSchedulingInfo.decPendingResource(
> schedulingRequest.getNodeLabelExpression(),
> sizing.getResources());
>   }
> }
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8353) LightWeightResource's hashCode function is different from parent class

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-8353:
---
Affects Version/s: (was: 3.0.x)

> LightWeightResource's hashCode function is different from parent class
> --
>
> Key: YARN-8353
> URL: https://issues.apache.org/jira/browse/YARN-8353
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Reporter: LongGang Chen
>Priority: Major
>
> LightWeightResource's hashCode function is different from parent class.
> One of the consequences is: 
> ContainerUpdateContext.removeFromOutstandingUpdate will nor work correct,
> ContainerUpdateContext.outstandingIncreases will has smelly datas.
> a simple test:
> {code:java}
> public void testHashCode() throws Exception{         
> Resource resource = Resources.createResource(10,10);         
> Resource resource1 = new ResourcePBImpl();       
> resource1.setMemorySize(10L);         
> resource1.setVirtualCores(10);         
> int x = resource.hashCode();         
> int y = resource1.hashCode();        
> Assert.assertEquals(x, y); 
> }
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8355) container update error because of competition

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-8355:
---
Affects Version/s: (was: 3.0.x)

> container update error because of competition
> -
>
> Key: YARN-8355
> URL: https://issues.apache.org/jira/browse/YARN-8355
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Reporter: LongGang Chen
>Priority: Major
>
> first, Quickly go through the update logic, Increase as an example:
>  * 1: normal work in ApplicationMasterService, DefaultAMSProcessor.    
>  * 2: CapacityScheduler.allocate will call 
> AbstractYarnScheduler.handleContainerUpdates
>  * 3: AbstractYarnScheduler.handleContainerUpdates will call 
> handleIncreaseRequests, then call 
> ContainerUpdateContext.checkAndAddToOutstandingIncreases
>  * 4: cancle && and new: checkAndAddToOutstandingIncreases will check this 
> inc update for this container, if there is an outstanding inc, it will cancle 
> it by calling appSchedulingInfo.allocate(...) to allocate a dummy container; 
> if the update is a fresh one, it will call 
> appSchedulingInfo.updateResourceRequests to add a new request. the capacity 
> of this new request is gap value between existing container and capacity of 
> updateRequest, for example, if original capacity is , the target 
> capacity of UpdateRequest is , the gap[the capacity of the new 
> request which will be added to appSchedulingInfo] is .
>  * 5: swap temp container and existing container: CapacityScheduler.allocate 
> call FiCaSchedulerApp.getAllocation(...), getAllocation will call 
> SchedulerApplicationAttempt.pullNewlyIncreasedContainers, then call 
> ContainerUpdateContext.swapContainer,swapContainer will swap the newly 
> allocated inc temp container with existing container, for example: original 
> capacity , temp inc container's capacity , so the 
> updated existing container has capacity ,inc update done.
> the problem is:
>  if we send inc update twice for a certain container, for example: send inc 
>  to , then send inc with new target , the 
> final updated capacity is uncertain.
> Scenes one:
>  * 1: send inc update from  to 
>  * 2: scheduler aproves it, and commit it, so app.liveContainers has this 
> temp inc container with capacity in it.
>  * 3: send inc with new target , a new resourceRequest with 
> capacity will add to appSchedulingInfo, and swap first temp 
> container, after that, the existing container has new 
> capacity
>  * 4: scheduler aproves the send temp resourceRequest, allocate second temp 
> container with capacity
>  * 5: swap the second inc temp container. so the updated capacity of this 
> existing container is  = , but wanted is 
> Scenes two:
>  * 1: send send inc update from  to 
>  * 2: scheduler aproves it, but the temp container with capacity is 
> queued in commitService, wait to commit
>  * 3: send inc with new target , will add a new resourceRequest to 
> appSchedulingInfo, but with same SchedulerRequestKey.
>  * 4: the first temp container commit, app.apply will call 
> appSchedulingInfo.allocate to reduce pending num, at this situation, it will 
> cancle the second inc request.
>  * 5: swap the first int temp container. the updated existing container's 
> capacity is , but the wanted is 
> two key points:
>  * 1: when ContainerUpdateContext.checkAndAddToOutstandingIncreases cancle 
> previous inc request and put current inc request, it use same 
> SchedulerRequestKey , this action has competition with app.apply, like scenes 
> two, app.apply will cancle second inc update's request.
>  * 2: ContainerUpdateContext.swapContainer do not check the update target 
> change or not.
> how to fix: 
>  * 1: after ContainerUpdateContext.checkAndAddToOutstandingIncreases cancle 
> previous inc update request , use a new SchedulerRequestKey for current inc 
> update request . we can add a new field createTime to distinguish them, 
> default value of createTime is 0
>  * 2: change ContainerUpdateContext.swapContainer to checkAndSwapContainer, 
> check update target change or not, if change, just ignore this temp container 
> and release it. like Scenes one, when we swap first temp inc container, we 
> found that if we do this swap, the updated capacity is , but the 
> newly target's capacity is , so we just ignore this swap, and 
> release the temp container.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8568) Replace the deprecated zk-address property in the HA config example in ResourceManagerHA.md

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-8568:
---
Affects Version/s: (was: 3.0.x)

> Replace the deprecated zk-address property in the HA config example in 
> ResourceManagerHA.md
> ---
>
> Key: YARN-8568
> URL: https://issues.apache.org/jira/browse/YARN-8568
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Minor
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-8568.001.patch
>
>
> yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
> In the example,  "yarn.resourcemanager.zk-address" is used which is 
> deprecated. In the description, the property name is correct 
> "hadoop.zk.address".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4677:
---
Fix Version/s: (was: 3.0.x)

> RMNodeResourceUpdateEvent update from scheduler can lead to race condition
> --
>
> Key: YARN-4677
> URL: https://issues.apache.org/jira/browse/YARN-4677
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Brook Zhou
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 2.9.2
>
> Attachments: YARN-4677-branch-2.001.patch, 
> YARN-4677-branch-2.002.patch, YARN-4677-branch-2.003.patch, YARN-4677.01.patch
>
>
> When a node is in decommissioning state, there is time window between 
> completedContainer() and RMNodeResourceUpdateEvent get handled in 
> scheduler.nodeUpdate (YARN-3223). 
> So if a scheduling effort happens within this window, the new container could 
> still get allocated on this node. Even worse case is if scheduling effort 
> happen after RMNodeResourceUpdateEvent sent out but before it is propagated 
> to SchedulerNode - then the total resource is lower than used resource and 
> available resource is a negative value. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4677:
---
Target Version/s: 3.1.1, 3.2.0, 2.9.2  (was: 3.2.0, 3.1.1, 2.9.2, 3.0.x)

> RMNodeResourceUpdateEvent update from scheduler can lead to race condition
> --
>
> Key: YARN-4677
> URL: https://issues.apache.org/jira/browse/YARN-4677
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, resourcemanager, scheduler
>Affects Versions: 2.7.1
>Reporter: Brook Zhou
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 2.9.2
>
> Attachments: YARN-4677-branch-2.001.patch, 
> YARN-4677-branch-2.002.patch, YARN-4677-branch-2.003.patch, YARN-4677.01.patch
>
>
> When a node is in decommissioning state, there is time window between 
> completedContainer() and RMNodeResourceUpdateEvent get handled in 
> scheduler.nodeUpdate (YARN-3223). 
> So if a scheduling effort happens within this window, the new container could 
> still get allocated on this node. Even worse case is if scheduling effort 
> happen after RMNodeResourceUpdateEvent sent out but before it is propagated 
> to SchedulerNode - then the total resource is lower than used resource and 
> available resource is a negative value. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8382) cgroup file leak in NM

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-8382:
---
Fix Version/s: (was: 3.0.x)

> cgroup file leak in NM
> --
>
> Key: YARN-8382
> URL: https://issues.apache.org/jira/browse/YARN-8382
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
> Environment: we write an container with a shutdownHook which has a 
> piece of code like  "while(true) sleep(100)" . when 
> *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms <* 
> *yarn.nodemanager.sleep-delay-before-sigkill.ms , cgourp file leak happens; 
> when* *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms >* 
> ** *yarn.nodemanager.sleep-delay-before-sigkill.ms, cgroup file is deleted 
> successfully***
>Reporter: Hu Ziqian
>Assignee: Hu Ziqian
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8382-branch-2.8.3.001.patch, 
> YARN-8382-branch-2.8.3.002.patch, YARN-8382.001.patch, YARN-8382.002.patch
>
>
> As Jiandan said in YARN-6562, NM may delete  Cgroup container file timeout 
> with logs like below:
> org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: 
> Unable to delete cgroup at: /cgroup/cpu/hadoop-yarn/container_xxx, tried to 
> delete for 1000ms
>  
> we found one situation is that when we set 
> *yarn.nodemanager.sleep-delay-before-sigkill.ms* bigger than 
> *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms*, the 
> cgroup file leak happens *.* 
>  
> One container process tree looks like follow graph:
> bash(16097)───java(16099)─┬─\{java}(16100) 
>                                                   ├─\{java}(16101) 
> {{                       ├─\{java}(16102)}}
>  
> {{when NM kills a container, NM sends kill -15 -pid to kill container process 
> group. Bash process will exit when it received sigterm, but java process may 
> do some job (shutdownHook etc.), and doesn't exit unit receive sigkill. And 
> when bash process exits, CgroupsLCEResourcesHandler begin to try to delete 
> cgroup files. So when 
> *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* 
> arrived, the java processes may still running and cgourp/tasks still not 
> empty and cause a cgroup file leak.}}
>  
> {{we add a condition that 
> *yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms* must 
> bigger than *yarn.nodemanager.sleep-delay-before-sigkill.ms* to solve this 
> problem.}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2097) Documentation: health check return status

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-2097.

Resolution: Won't Fix

> Documentation: health check return status
> -
>
> Key: YARN-2097
> URL: https://issues.apache.org/jira/browse/YARN-2097
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Allen Wittenauer
>Assignee: Rekha Joshi
>Priority: Major
>  Labels: newbie
> Attachments: YARN-2097.1.patch
>
>
> We need to document that the output of the health check script is ignored on 
> non-0 exit status.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2345) yarn rmadmin -report

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-2345.

Resolution: Won't Fix

> yarn rmadmin -report
> 
>
> Key: YARN-2345
> URL: https://issues.apache.org/jira/browse/YARN-2345
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Reporter: Allen Wittenauer
>Assignee: Hao Gao
>Priority: Major
>  Labels: newbie
> Attachments: YARN-2345.1.patch
>
>
> It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2413) capacity scheduler will overallocate vcores

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-2413.

Resolution: Won't Fix

> capacity scheduler will overallocate vcores
> ---
>
> Key: YARN-2413
> URL: https://issues.apache.org/jira/browse/YARN-2413
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, scheduler
>Affects Versions: 2.2.0, 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Critical
>
> It doesn't appear that the capacity scheduler is properly allocating vcores 
> when making scheduling decisions, which may result in overallocation of CPU 
> resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2429) LCE should blacklist based upon group

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-2429.

Resolution: Won't Fix

> LCE should blacklist based upon group
> -
>
> Key: YARN-2429
> URL: https://issues.apache.org/jira/browse/YARN-2429
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: security
>Reporter: Allen Wittenauer
>Priority: Major
>  Labels: newbie
>
> It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2471) DEFAULT_YARN_APPLICATION_CLASSPATH doesn't honor hadoop-layout.sh

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-2471.

Resolution: Won't Fix

> DEFAULT_YARN_APPLICATION_CLASSPATH doesn't honor hadoop-layout.sh
> -
>
> Key: YARN-2471
> URL: https://issues.apache.org/jira/browse/YARN-2471
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Allen Wittenauer
>Priority: Major
>
> In 0.21, hadoop-layout.sh was introduced to allow for vendors to reorganize 
> the Hadoop distribution in a way that pleases them.  
> DEFAULT_YARN_APPLICATION_CLASSPATH hard-codes the paths that hadoop-layout.sh 
> was meant to override.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2806) log container allocation requests

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-2806.

Resolution: Won't Fix

> log container allocation requests
> -
>
> Key: YARN-2806
> URL: https://issues.apache.org/jira/browse/YARN-2806
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Allen Wittenauer
>Assignee: Eric Wohlstadter
>Priority: Major
> Attachments: YARN-2806.patch
>
>
> I might have missed it, but I don't see where we log application container 
> requests outside of the DEBUG context.  Without this being logged, we have no 
> idea on a per application the lag an application might be having in the 
> allocation system. 
> We should probably add this as an event to the RM audit log.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-3175) Consolidate the ResournceManager documentation into one

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-3175.

Resolution: Won't Fix

> Consolidate the ResournceManager documentation into one
> ---
>
> Key: YARN-3175
> URL: https://issues.apache.org/jira/browse/YARN-3175
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Allen Wittenauer
>Priority: Major
>
> We really don't need a different document for every individual RM feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-3484) Fix up yarn top shell code

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-3484.

  Resolution: Won't Fix
Target Version/s:   (was: )

> Fix up yarn top shell code
> --
>
> Key: YARN-3484
> URL: https://issues.apache.org/jira/browse/YARN-3484
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Varun Vasudev
>Priority: Major
>  Labels: newbie
> Attachments: YARN-3484.001.patch, YARN-3484.002.patch
>
>
> We need to do some work on yarn top's shell code.
> a) Just checking for TERM isn't good enough.  We really need to check the 
> return on tput, especially since the output will not be a number but an error 
> string which will likely blow up the java code in horrible ways.
> b) All the single bracket tests should be double brackets to force the bash 
> built-in.
> c) I'd think I'd rather see the shell portion in a function since it's rather 
> large.  This will allow for args, etc, to get local'ized and clean up the 
> case statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-4432) yarn launch script works by chance

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-4432.

Resolution: Won't Fix

> yarn launch script works by chance
> --
>
> Key: YARN-4432
> URL: https://issues.apache.org/jira/browse/YARN-4432
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts, yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> The YARN launch script has (at least) three big problems:
> * Usage of env vars before being assigned
> * Usage of env vars that are never assigned
> * Assumption that HADOOP_ROOT_LOGGER allows overrides 
> These need to be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5064) move the shell code out of hadoop-yarn

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-5064.

Resolution: Won't Fix

> move the shell code out of hadoop-yarn
> --
>
> Key: YARN-5064
> URL: https://issues.apache.org/jira/browse/YARN-5064
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: scripts, test
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Major
>
> We need to move the shell code out of hadoop-yarn so that we can properly 
> build test infrastructure for it. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5099) hadoop-yarn unit tests for dynamic commands

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-5099.

Resolution: Won't Fix

> hadoop-yarn unit tests for dynamic commands
> ---
>
> Key: YARN-5099
> URL: https://issues.apache.org/jira/browse/YARN-5099
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: scripts, test
>Reporter: Allen Wittenauer
>Priority: Major
>
> This is a hold-over from HADOOP-12930, dynamic sub commands.  Currently, the 
> yarn changes lack unit tests and they really should be there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5530) YARN dependencies are a complete mess

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-5530.

Resolution: Won't Fix

> YARN dependencies are a complete mess
> -
>
> Key: YARN-5530
> URL: https://issues.apache.org/jira/browse/YARN-5530
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Critical
>
> YARN's share/hadoop/yarn/lib is pretty much a disaster area.  Multiple jars 
> have multiple versions.  Then there are the version collisions with the rest 
> of Hadoop.  Oh, and then there are the test jars sitting in there.
> This really needs to get cleaned up since all of this stuff is on the 
> classpath and are likely going to cause a lot of problems down the road, 
> never mind the download bloat. (trunk's yarn dependencies are 2x what they 
> were in branch-2, thereby eliminating all the gains made by de-duping jars 
> across the projects.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5454) Various places have a hard-coded location for bash

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-5454.

Resolution: Won't Fix

> Various places have a hard-coded location for bash
> --
>
> Key: YARN-5454
> URL: https://issues.apache.org/jira/browse/YARN-5454
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Priority: Major
>
> Lots of places in nodemanager have the location of bash hard-coded to 
> /bin/bash. This is not portable.  bash should either be found via 
> /usr/bin/env or have no path at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5635) Better handling when bad script is configured as Node's HealthScript

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-5635.

Resolution: Won't Fix

> Better handling when bad script is configured as Node's HealthScript
> 
>
> Key: YARN-5635
> URL: https://issues.apache.org/jira/browse/YARN-5635
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Allen Wittenauer
>Priority: Major
>
> Earlier fix to YARN-5567 is reverted because its not ideal to get the whole 
> cluster down because of a bad script. At the same time its important to 
> report that script is erroneous which is configured as node health script as 
> it might miss to detect bad health of a node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6241) Remove -jt flag

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-6241.

Resolution: Won't Fix

> Remove -jt flag
> ---
>
> Key: YARN-6241
> URL: https://issues.apache.org/jira/browse/YARN-6241
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Priority: Major
>
> The -jt flag is used to send a job to a remote resourcemanager.  Given the 
> flag, this is clearly left over from pre-YARN days.  With the addition of the 
> time line server and other YARN services, the flag doesn't really work that 
> well anymore.  It's probably better to deprecate it in 2.x and remove from 
> 3.x than attempt to fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6452) test-container-executor should not be in bin in dist tarball

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-6452.

Resolution: Won't Fix

> test-container-executor should not be in bin in dist tarball
> 
>
> Key: YARN-6452
> URL: https://issues.apache.org/jira/browse/YARN-6452
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Allen Wittenauer
>Priority: Minor
>
> test-container-executor should probably be in sbin or libexec or not there at 
> all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7588) Remove 'yarn historyserver' from bin/yarn

2018-09-01 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-7588.

Resolution: Won't Fix

> Remove 'yarn historyserver' from bin/yarn
> -
>
> Key: YARN-7588
> URL: https://issues.apache.org/jira/browse/YARN-7588
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Priority: Minor
>
> 'yarn historyserver' command has been replaced with 'yarn timelineserver' 
> since 2.7.0.  Let's remove the dead code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8638) Allow linux container runtimes to be pluggable

2018-08-08 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574191#comment-16574191
 ] 

Allen Wittenauer commented on YARN-8638:


This sort of exemplifies the misuse of the word 'linux' when 'unix' or 'posix' 
would have been better.  With the exception of cgroups (which is relatively 
new), there is very little that doesn't work on a variety of platforms.

With this proposed change, it might be time to rename the class entirely and 
just add 'linux' as an alias. This would open the door up specifically for 
things like FreeBSD jails, Solaris Zones, quite a few other sandboxing 
technologies, and more. 

> Allow linux container runtimes to be pluggable
> --
>
> Key: YARN-8638
> URL: https://issues.apache.org/jira/browse/YARN-8638
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Priority: Minor
>
> YARN currently supports three different Linux container runtimes (default, 
> docker, and javasandbox). However, it would be relatively straightforward to 
> support arbitrary runtime implementations. This would enable easier 
> experimentation with new and emerging runtime technologies (runc, containerd, 
> etc.) without requiring a rebuild and redeployment of Hadoop. 
> This could be accomplished via a simple configuration change:
> {code:xml}
> 
>  yarn.nodemanager.runtime.linux.allowed-runtimes
>  default,docker,experimental
> 
>  
> 
>  yarn.nodemanager.runtime.linux.experimental.class
>  com.somecompany.yarn.runtime.ExperimentalLinuxContainerRuntime
> {code}
>  
> In this example, {{yarn.nodemanager.runtime.linux.allowed-runtimes}} would 
> now allow arbitrary values. Additionally, 
> {{yarn.nodemanager.runtime.linux.\{RUNTIME_KEY}.class}} would indicate the 
> {{LinuxContainerRuntime}} implementation to instantiate. A no-argument 
> constructor should be sufficient, as {{LinuxContainerRuntime}} already 
> provides an {{initialize()}} method.
> {{DockerLinuxContainerRuntime.isDockerContainerRequested(Map 
> env)}} and {{JavaSandboxLinuxContainerRuntime.isSandboxContainerRequested()}} 
> could be generalized to {{isRuntimeRequested(Map env)}} and 
> added to the {{LinuxContainerRuntime}} interface. This would allow 
> {{DelegatingLinuxContainerRuntime}} to select an appropriate runtime based on 
> whether that runtime claimed ownership of the current container execution.
> For backwards compatibility, the existing values (default,docker,javasandbox) 
> would continue to be supported as-is. Under the current logic, the evaluation 
> order is javasandbox, docker, default (with default being chosen if no other 
> candidates are available). Under the new evaluation logic, pluggable runtimes 
> would be evaluated after docker and before default, in the order in which 
> they are defined in the allowed-runtimes list. This will change no behavior 
> on current clusters (as there would be no pluggable runtimes defined), and 
> preserves behavior with respect to ordering of existing runtimes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8536) Add max heap config option for Federation Router

2018-07-17 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546654#comment-16546654
 ] 

Allen Wittenauer commented on YARN-8536:


bq.  friendly to legacy scripts.

Adding a new var doesn't help legacy scripts since the var didn't exist before 
for them to use

bq. Shouldn't hurt right?

It does in a variety of ways:

1)  Done properly, every config variable adds at least 10 lines of bash and 5 
lines of DOS batch.  (and that's not counting src/site documentation, assuming 
that contributors even bother).  That makes it a long-term support burden for 
just a bit of syntactic sugar. 

2) There is already _OPTS to tune JVMs.  If _HEAPSIZE is used and _OPTS is 
used, where should the Xmx value come from?  Prior to the work I did in 
HADOOP-9902,  this wasn't implemented consistently nor was it obvious to the 
end user which one took precedence. 

3) This is a slippery slope.  Why should Xmx be the only JVM param with a 
custom variable?

4) Before it gets said, I don't buy the "easier for end users" argument either. 
 In production scenarios, daemons almost always need additional parameters 
above and beyond heap (gc logging, etc).  So _OPTS gets defined anyway.  

Long-term, we'd be better served to remove the _HEAPSIZE variables and to 
standardize on _OPTS.  It would greatly cut back on a lot of excess code and 
make it absolutely clear to users that _OPTS is where all JVM tuning should go.

> Add max heap config option for Federation Router
> 
>
> Key: YARN-8536
> URL: https://issues.apache.org/jira/browse/YARN-8536
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-8536.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8536) Add max heap config option for Federation Router

2018-07-13 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543977#comment-16543977
 ] 

Allen Wittenauer commented on YARN-8536:


Why? You can set all of the Java opts via _OPTS.  Even the patch points out 
this is unnecessary:

{code}
+  # Backwards compatibility
{code}

> Add max heap config option for Federation Router
> 
>
> Key: YARN-8536
> URL: https://issues.apache.org/jira/browse/YARN-8536
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Attachments: YARN-8536.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8275) Create a JNI interface to interact with Windows

2018-05-14 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473789#comment-16473789
 ] 

Allen Wittenauer edited comment on YARN-8275 at 5/14/18 6:01 AM:
-

bq.  I am planning to code everything in Commons to be used from YARN and HDFS.

The umbrella JIRA should really start out in HADOOP so that people aren't taken 
by surprise.  I suspect any YARN and HDFS specific code to be relatively tiny 
since winutils is used all over the place, including in the client code.  

That fact probably makes ...

bq. a long running native process communicating with YARN over pipe

almost certainly a non-starter, never mind the security concerns, with greatly 
increasing the complexity for likely very little gain.

The other thing to keep in mind is that winutils pre-dates Java 7.  Things like 
symlinks can now be done with Java APIs.  No C required.  I'd highly recommend 
starting with replacing the winutils calls with Java API calls first and then 
digging into something more complex later.  [The Unix versions of those same 
calls will likely get a speed bump too.]

---

Before I forget, from a "what gets run on the maven command line", there is 
very little difference between libhadoop (JNI) and winutils.  Windows *always* 
requires (and thus triggers) -Pnative.  

I suspect the direction was set because winutils was added when libhadoop was 
still being built by autoconf.  But now that cmake is there and works properly 
on Windows (at least in 3.x), it'd be nice to place the core of winutils into 
libhadoop and just keep winutils as a wrapper to use for debugging.  This might 
also move us away from using MSBuild, which would greatly simplify the build 
process.


was (Author: aw):
bq.  I am planning to code everything in Commons to be used from YARN and HDFS.

The umbrella JIRA should really start out in HADOOP so that people aren't taken 
by surprise.  I suspect any YARN and HDFS specific code to be relatively tiny 
since winutils is used all over the place, including in the client code.  

That fact probably makes ...

bq. a long running native process communicating with YARN over pipe

almost certainly a non-starter, never mind the security concerns, with greatly 
increasing the complexity for likely very little gain.

The other thing to keep in mind is that winutils pre-dates Java 7.  Things like 
symlinks can now be done with Java APIs.  No C required.  I'd highly recommend 
starting with replacing the winutils calls with Java API calls first and then 
digging into something more complex later.  [The Unix versions of those same 
calls will likely get a speed bump too.]

> Create a JNI interface to interact with Windows
> ---
>
> Key: YARN-8275
> URL: https://issues.apache.org/jira/browse/YARN-8275
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: WinUtils-Functions.pdf, WinUtils.CSV
>
>
> I did a quick investigation of the performance of WinUtils in YARN. In 
> average NM calls 4.76 times per second and 65.51 per container.
>  
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
>  Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows

2018-05-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473789#comment-16473789
 ] 

Allen Wittenauer commented on YARN-8275:


bq.  I am planning to code everything in Commons to be used from YARN and HDFS.

The umbrella JIRA should really start out in HADOOP so that people aren't taken 
by surprise.  I suspect any YARN and HDFS specific code to be relatively tiny 
since winutils is used all over the place, including in the client code.  

That fact probably makes ...

bq. a long running native process communicating with YARN over pipe

almost certainly a non-starter, never mind the security concerns, with greatly 
increasing the complexity for likely very little gain.

The other thing to keep in mind is that winutils pre-dates Java 7.  Things like 
symlinks can now be done with Java APIs.  No C required.  I'd highly recommend 
starting with replacing the winutils calls with Java API calls first and then 
digging into something more complex later.  [The Unix versions of those same 
calls will likely get a speed bump too.]

> Create a JNI interface to interact with Windows
> ---
>
> Key: YARN-8275
> URL: https://issues.apache.org/jira/browse/YARN-8275
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: WinUtils-Functions.pdf, WinUtils.CSV
>
>
> I did a quick investigation of the performance of WinUtils in YARN. In 
> average NM calls 4.76 times per second and 65.51 per container.
>  
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
>  Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5121) fix some container-executor portability issues

2018-05-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-5121:
---
Target Version/s: 3.0.0-alpha2, 2.7.4  (was: 2.7.4, 3.0.0-alpha2)
  Labels: CVE security  (was: security)
 Description: 
container-executor has some issues that are preventing it from even compiling 
on the OS X jenkins instance.  Let's fix those.  While we're there, let's also 
try to take care of some of the other portability problems that have crept in 
over the years, since it used to work great on Solaris but now doesn't.

This issue fixes CVE-2016-6811.

  was:container-executor has some issues that are preventing it from even 
compiling on the OS X jenkins instance.  Let's fix those.  While we're there, 
let's also try to take care of some of the other portability problems that have 
crept in over the years, since it used to work great on Solaris but now doesn't.


> fix some container-executor portability issues
> --
>
> Key: YARN-5121
> URL: https://issues.apache.org/jira/browse/YARN-5121
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, security
>Affects Versions: 3.0.0-alpha1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
>  Labels: CVE, security
> Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1
>
> Attachments: YARN-5121.00.patch, YARN-5121.01.patch, 
> YARN-5121.02.patch, YARN-5121.03.patch, YARN-5121.04.patch, 
> YARN-5121.06.patch, YARN-5121.07.patch, YARN-5121.08.patch, 
> YARN-6698-branch-2.7-01.patch
>
>
> container-executor has some issues that are preventing it from even compiling 
> on the OS X jenkins instance.  Let's fix those.  While we're there, let's 
> also try to take care of some of the other portability problems that have 
> crept in over the years, since it used to work great on Solaris but now 
> doesn't.
> This issue fixes CVE-2016-6811.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5848) public/crossdomain.xml is problematic

2018-01-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348049#comment-16348049
 ] 

Allen Wittenauer commented on YARN-5848:


I'm raising this to a blocker, now that these cross domain files are making the 
nightly builds fail due to broken XML formatting.

> public/crossdomain.xml is problematic
> -
>
> Key: YARN-5848
> URL: https://issues.apache.org/jira/browse/YARN-5848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.0.0-alpha2, 3.1.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> crossdomain.xml should really have an ASF header in it and be in the src 
> directory somewhere.  There's zero reason for it to have RAT exception given 
> that comments are possible in xml files.  It's also not in a standard maven 
> location, which should really be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5848) public/crossdomain.xml is problematic

2018-01-31 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-5848:
---
Affects Version/s: 3.1.0

> public/crossdomain.xml is problematic
> -
>
> Key: YARN-5848
> URL: https://issues.apache.org/jira/browse/YARN-5848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.0.0-alpha2, 3.1.0
>Reporter: Allen Wittenauer
>Priority: Major
>
> crossdomain.xml should really have an ASF header in it and be in the src 
> directory somewhere.  There's zero reason for it to have RAT exception given 
> that comments are possible in xml files.  It's also not in a standard maven 
> location, which should really be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5848) public/crossdomain.xml is problematic

2018-01-31 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-5848:
---
Priority: Blocker  (was: Major)

> public/crossdomain.xml is problematic
> -
>
> Key: YARN-5848
> URL: https://issues.apache.org/jira/browse/YARN-5848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.0.0-alpha2, 3.1.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> crossdomain.xml should really have an ASF header in it and be in the src 
> directory somewhere.  There's zero reason for it to have RAT exception given 
> that comments are possible in xml files.  It's also not in a standard maven 
> location, which should really be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7677) HADOOP_CONF_DIR should not be automatically put in task environment

2017-12-22 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-7677:
---
Hadoop Flags: Incompatible change

> HADOOP_CONF_DIR should not be automatically put in task environment
> ---
>
> Key: YARN-7677
> URL: https://issues.apache.org/jira/browse/YARN-7677
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>
> Currently, {{HADOOP_CONF_DIR}} is being put into the task environment whether 
> it's set by the user or not. It completely bypasses the whitelist and so 
> there is no way for a task to not have {{HADOOP_CONF_DIR}} set. This causes 
> problems in the Docker use case where Docker containers will set up their own 
> environment and have their own {{HADOOP_CONF_DIR}} preset in the image 
> itself. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2017-12-16 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-7190:
---
Hadoop Flags: Incompatible change,Reviewed  (was: Reviewed)

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Reporter: Vrushali C
>Assignee: Varun Saxena
> Fix For: YARN-5355_branch2, 3.1.0, 2.9.1, 3.0.1
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch, YARN-7190.02.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7588) Remove 'yarn historyserver' from bin/yarn

2017-11-30 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-7588:
--

 Summary: Remove 'yarn historyserver' from bin/yarn
 Key: YARN-7588
 URL: https://issues.apache.org/jira/browse/YARN-7588
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Priority: Minor


'yarn historyserver' command has been replaced with 'yarn timelineserver' since 
2.7.0.  Let's remove the dead code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7468) Provide means for container network policy control

2017-11-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246778#comment-16246778
 ] 

Allen Wittenauer commented on YARN-7468:


bq. Ideally, I'd have all the external endpoints secured to disallow this 
cluster from talking back except for very fine-grained allowances – it's a big 
world and I can't.

It also won't prevent DDoS attacks anyway.  Plus, while most of the Hadoop 
ecosystem has ACL support, in most cases it's not particularly well 
implemented, and that is before the dynamic reconfiguration use case you've 
effectively presented here.

bq.  In all fairness, I could use tcpspy and have it record the PID of 
processes today too

In the short term, it's probably easier to just force the use of LCE but with a 
wrapper around container-executor to set up the control information you want.  
Since the NM and c-e talk pretty much exclusively through a CLI (with all the 
security concerns that brings with it...), this setup should be pretty trivial 
to do and give you all the information you need to setup extra cgroups or 
whatever. 

That said, c-e probably should be more pluggable to allow people to run their 
own bits.  [I've been a proponent of c-e getting switched over to do dlopen()'s 
vs. the current static compiling for features.  This is a great example where 
it'd be extremely useful.] 

> Provide means for container network policy control
> --
>
> Key: YARN-7468
> URL: https://issues.apache.org/jira/browse/YARN-7468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Clay B.
>Priority: Minor
>
> To prevent data exfiltration from a YARN cluster, it would be very helpful to 
> have "firewall" rules able to map to a user/queue's containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7432) DominantResourceFairnessPolicy serializable findbugs issues

2017-11-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236592#comment-16236592
 ] 

Allen Wittenauer commented on YARN-7432:


Then put in an exception.

> DominantResourceFairnessPolicy serializable findbugs issues
> ---
>
> Key: YARN-7432
> URL: https://issues.apache.org/jira/browse/YARN-7432
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> There are two findbugs issues in fair share in the daily qbt: 
> https://s.apache.org/hf5r



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7432) DominantResourceFairnessPolicy serializable findbugs issues

2017-11-02 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-7432:
--

 Summary: DominantResourceFairnessPolicy serializable findbugs 
issues
 Key: YARN-7432
 URL: https://issues.apache.org/jira/browse/YARN-7432
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.0, 3.1.0
Reporter: Allen Wittenauer
Priority: Blocker


There are two findbugs issues in fair share in the daily qbt: 
https://s.apache.org/hf5r





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7431) resource estimator has findbugs problems

2017-11-02 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236551#comment-16236551
 ] 

Allen Wittenauer commented on YARN-7431:


It's been in the daily qbt for a while now:  https://s.apache.org/hf5r



> resource estimator has findbugs problems
> 
>
> Key: YARN-7431
> URL: https://issues.apache.org/jira/browse/YARN-7431
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> Just see any recent report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7431) resource estimator has findbugs problems

2017-11-02 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-7431:
---
Component/s: resourcemanager

> resource estimator has findbugs problems
> 
>
> Key: YARN-7431
> URL: https://issues.apache.org/jira/browse/YARN-7431
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.9.0, 3.1.0
>Reporter: Allen Wittenauer
>Priority: Blocker
>
> Just see any recent report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7431) resource estimator has findbugs problems

2017-11-02 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-7431:
--

 Summary: resource estimator has findbugs problems
 Key: YARN-7431
 URL: https://issues.apache.org/jira/browse/YARN-7431
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.0, 3.1.0
Reporter: Allen Wittenauer
Priority: Blocker


Just see any recent report.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7127) Merge yarn-native-service branch into trunk

2017-10-18 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209865#comment-16209865
 ] 

Allen Wittenauer commented on YARN-7127:


bq.  With that assumption, will a separate service subcommand make sense ? 

Let's test that assumption.

Would a user be able to replace the bundled AM with their own and retain all of 
the functionality?

If someone wanted to replicate the native services features, would they be able 
to do it using only Public APIs?

bq.  User end up having a larger set of options and need to read through the 
docs to figure out which ones are applicable to service, or which ones are 
applicable to all apps. 

They'd have to do this anyway whether the commands are spit apart or not.  In 
fact, it's worse if they are split because now there are two sets of commands 
that work different and apply to different applications.  e.g., I can't use the 
proposed 'yarn service' command to stop MapReduce.

> Merge yarn-native-service branch into trunk
> ---
>
> Key: YARN-7127
> URL: https://issues.apache.org/jira/browse/YARN-7127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-7127.01.patch, YARN-7127.02.patch, 
> YARN-7127.03.patch, YARN-7127.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7127) Merge yarn-native-service branch into trunk

2017-10-15 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205252#comment-16205252
 ] 

Allen Wittenauer edited comment on YARN-7127 at 10/15/17 7:10 PM:
--

I thought some more about this topic this morning and had two more things to 
add:

1) I think an AM should have a way to tell the RM about any extra capabilities 
it might have.  This feature isn't particularly useful for the RM, but it would 
be beneficial for any clients.  For example, the MR AM might tag itself as 
"jobtracker" to note that it supports the extra features that the 'mapred' 
command uses.  A Slider AM might tag itself as 'slider' or 'native' or whatever 
to signify that it supports those extensions. etc. etc.   That would make 
extending the yarn application subcommand MUCH easier and potentially even open 
the door for extensions/plug-ins to that command from third parties. For 
example, turning the extra mapred subcommands into a hook off of yarn 
application would allow us to ultimately kill the mapred command once the 
timeline server is capable of doing everything that the history server can.

2) A large part of the discussion here is fueled by contradicting views on this 
project's place within Hadoop.  If one takes the belief that it's "just another 
framework, like MapReduce," then creating separate sub-commands, documentation, 
daemons, etc. seems logical.   If one takes the view that it's "part of YARN," 
then adding new sub-commands, a separate documentation section, and a ton of 
new daemons does not make sense.

But it doesn't appear that either of those choices has been made. Portions of 
the code base are in the separate framework type of mold, but other changes are 
to core YARN functionality, even if we push aside "obviously part of YARN" bits 
like RegistryDNS.

It seems as though the folks working on this branch need to make that decision 
and drive it to completion:  is it part of YARN or is it not?  If it's the 
former, then that means full integration: no more separate API daemon, no 
different subcommand structure, etc., etc.  If it's the latter, then that means 
total separation: it needs to be a separate subproject, no shared code base, 
new top-level command, etc., etc.

Having a foot in both is what is ultimately driving this disagreement and will 
eventually confuse users.  


was (Author: aw):
I thought some more about this topic this morning and had two more thoughts:

1) I think an AM should have a way to tell the RM about any extra capabilities 
it might have.  This feature isn't particularly useful for the RM, but it would 
be beneficial for any clients.  For example, the MR AM might tag itself as 
"jobtracker" to note that it supports the extra features that the 'mapred' 
command uses.  A Slider AM might tag itself as 'slider' or 'native' or whatever 
to signify that it supports those extensions. etc. etc.   That would make 
extending the yarn application subcommand MUCH easier and potentially even open 
the door for extensions/plug-ins to that command from third parties. For 
example, turning the extra mapred subcommands into a hook off of yarn 
application would allow us to ultimately kill the mapred command once the 
timeline server is capable of doing everything that the history server can.

2) A large part of the discussion here is fueled by contradicting views on this 
project's place within Hadoop.  If one takes the belief that it's "just another 
framework, like MapReduce," then creating separate sub-commands, documentation, 
daemons, etc. seems logical.   If one takes the view that it's "part of YARN," 
then adding new sub-commands, a separate documentation section, and a ton of 
new daemons does not make sense.

But it doesn't appear that either of those choices has been made. Portions of 
the code base are in the separate framework type of mold, but other changes are 
to core YARN functionality, even if we push aside "obviously part of YARN" bits 
like RegistryDNS.

It seems as though the folks working on this branch need to make that decision 
and drive it to completion:  is it part of YARN or is it not?  If it's the 
former, then that means full integration: no more separate API daemon, no 
different subcommand structure, etc., etc.  If it's the latter, then that means 
total separation: it needs to be a separate subproject, no shared code base, 
new top-level command, etc., etc.

Having a foot in both is what is ultimately driving this disagreement and will 
eventually confuse users.  

> Merge yarn-native-service branch into trunk
> ---
>
> Key: YARN-7127
> URL: https://issues.apache.org/jira/browse/YARN-7127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-7127.01.patch, 

[jira] [Commented] (YARN-7127) Merge yarn-native-service branch into trunk

2017-10-15 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205252#comment-16205252
 ] 

Allen Wittenauer commented on YARN-7127:


I thought some more about this topic this morning and had two more thoughts:

1) I think an AM should have a way to tell the RM about any extra capabilities 
it might have.  This feature isn't particularly useful for the RM, but it would 
be beneficial for any clients.  For example, the MR AM might tag itself as 
"jobtracker" to note that it supports the extra features that the 'mapred' 
command uses.  A Slider AM might tag itself as 'slider' or 'native' or whatever 
to signify that it supports those extensions. etc. etc.   That would make 
extending the yarn application subcommand MUCH easier and potentially even open 
the door for extensions/plug-ins to that command from third parties. For 
example, turning the extra mapred subcommands into a hook off of yarn 
application would allow us to ultimately kill the mapred command once the 
timeline server is capable of doing everything that the history server can.

2) A large part of the discussion here is fueled by contradicting views on this 
project's place within Hadoop.  If one takes the belief that it's "just another 
framework, like MapReduce," then creating separate sub-commands, documentation, 
daemons, etc. seems logical.   If one takes the view that it's "part of YARN," 
then adding new sub-commands, a separate documentation section, and a ton of 
new daemons does not make sense.

But it doesn't appear that either of those choices has been made. Portions of 
the code base are in the separate framework type of mold, but other changes are 
to core YARN functionality, even if we push aside "obviously part of YARN" bits 
like RegistryDNS.

It seems as though the folks working on this branch need to make that decision 
and drive it to completion:  is it part of YARN or is it not?  If it's the 
former, then that means full integration: no more separate API daemon, no 
different subcommand structure, etc., etc.  If it's the latter, then that means 
total separation: it needs to be a separate subproject, no shared code base, 
new top-level command, etc., etc.

Having a foot in both is what is ultimately driving this disagreement and will 
eventually confuse users.  

> Merge yarn-native-service branch into trunk
> ---
>
> Key: YARN-7127
> URL: https://issues.apache.org/jira/browse/YARN-7127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-7127.01.patch, YARN-7127.02.patch, 
> YARN-7127.03.patch, YARN-7127.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7127) Merge yarn-native-service branch into trunk

2017-10-14 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204792#comment-16204792
 ] 

Allen Wittenauer commented on YARN-7127:


bq. It's the command to interact with ResourceManager which does listing / 
updating of application metadata to YARN's point of view. Although it's called 
application, It's NOT a command specific to the app. (i.e. AM).

bq. However for 'yarn service', it's the command to interact with the service 
framework, i.e. the special AM we wrote.

Users don't care about the internals of a command. They want cohesion.

bq. there has to be a differentiator.

It seems as though a lot of the issues raised here are because the RM isn't 
keeping track of which frameworks that YARN provides that a given application 
is using or used to launch.  This is a "record keeping" problem.  Passing that 
onto the user is going to be problematic when one considers that multiple users 
may be interacting with a given AM. 
If I'm the ops person that needs to take down a bad acting AM, why should I 
have to cycle through a bunch of different commands to figure out which one 
works when the RM should be able to provide a hint?

To expand on that: 

bq. Similarly, for service framework, it's a special AM. It has its own 
semantics and use cases. E.g. flex the component count, upgrade the component. 
The component is the concept only specific to service, not to the yarn generic 
app. If we merge it with generic "application" command, what will the 
'component' mean for other apps like MR? 

Not all command line arguments actually have to work with every AM type.  If a 
user gives a nonsense request, it's ok to throw a helpful message and error 
out.  Not everything needs to succeed.  In this particular case, the command 
should be asking the RM if the given AM was started with the services API and 
act appropriately.  If not, throw an error.

bq.  it seems more of a larger umbrella effort - expanding "yarn application" 
to provide a unified support for all disparate apps to roll into it.

e.g., what should have been part of this project from the start.  To me, this 
is a showstopper issue for the "native services" API. In it's current 
incarnation, it definitely feels bolted on rather than real functionality 
included as part of YARN.

Just for completeness:

bq. MapReduce is a customized AM on YARN, it has its own mapred command to 
interact with its own AM, which only makes sense to itself. like "mapred 
distcp". Will it make sense to merge the 'distcp' sub command to 'yarn 
application' command?

Apples and oranges and completely ignoring the historical context involved. 
[I'd expand on the history here, but it is sort of an orthogonal discussion.  
If anyone cares, I can write up though.] 

I will say that given hindsight, I'm fairly confident that the mapred command 
wouldn't exist and almost all of the features it provides would either not 
exist or be merged into the yarn command somewhere.

> Merge yarn-native-service branch into trunk
> ---
>
> Key: YARN-7127
> URL: https://issues.apache.org/jira/browse/YARN-7127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-7127.01.patch, YARN-7127.02.patch, 
> YARN-7127.03.patch, YARN-7127.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7326) Some issues in RegistryDNS

2017-10-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204208#comment-16204208
 ] 

Allen Wittenauer commented on YARN-7326:


Without looking too hard at the current state (so I apologize if I've missed 
something) but to me, there are three showstopper issues:

1) Obviously the RegistryDNS 100% cpu issue.  [I'm truly surprised that no one 
else had noticed its awful performance characteristics.]

2) Banish the separate API server, now that YARN-6626 has been committed.  It's 
confusing and greatly increases the operating costs (and worse, potential 
security exposure) for little-to-no real benefit vs just using the REST API 
from the RM.  So just remove it from the docs and the yarn command.

3) Integrate the yarn service commands into yarn application as mentioned by 
Eric Yang.

I'd really like to see, but also wouldn't block the merge for:

1) Actually integrate the docs with the rest of yarn-site.  I'm not sure what 
benefit there is of having a separate documentation section, especially given 
#2 above and that the registrydns server could be used independently of the 
REST API.

2) A more complex example that doesn't use Docker.  This is important given 
that the docker bits in YARN have some significant security problems.  A lot of 
sites probably can't or won't enable the Docker subsystem for quite a while as 
a result.

3) Slider migration guide.

> Some issues in RegistryDNS
> --
>
> Key: YARN-7326
> URL: https://issues.apache.org/jira/browse/YARN-7326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
>
> [~aw] helped to identify these issues: 
> Now some general bad news, not related to this patch:
> Ran a few queries, but this one is a bit concerning:
> {code}
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
> ;; Warning: query response not set
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
> ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#54(127.0.0.1)
> ;; WHEN: Thu Oct 12 16:04:54 PDT 2017
> ;; MSG SIZE  rcvd: 12
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
> ;; Connection to ::1#54(::1) for . failed: connection refused.
> ;; communications error to 127.0.0.1#54: end of file
> root@ubuntu:/hadoop/logs# 
> {code}
> It looks like it effectively fails when asked about a root zone, which is bad.
> It's also kind of interesting in what it does and doesn't log. Probably 
> should be configured to rotate logs based on size not date.
> The real showstopper though: RegistryDNS basically eats a core. It is running 
> with 100% cpu utilization with and without jsvc. On my laptop, this is 
> triggering my fan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7198) Add jsvc support for RegistryDNS

2017-10-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202787#comment-16202787
 ] 

Allen Wittenauer edited comment on YARN-7198 at 10/12/17 11:22 PM:
---

Anyway, rebuilt and it started up. Also it didn't switch to yarn this time 
without the env var set, so I'm not sure what was going on there.

In any case, I'm +1 on this particular patch, pending Jenkins. 

Now some general bad news, not related to this patch:

Ran a few queries, but this one is a bit concerning:

{code}
root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
;; Warning: query response not set

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; Query time: 0 msec
;; SERVER: 127.0.0.1#54(127.0.0.1)
;; WHEN: Thu Oct 12 16:04:54 PDT 2017
;; MSG SIZE  rcvd: 12

root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
;; Connection to ::1#54(::1) for . failed: connection refused.
;; communications error to 127.0.0.1#54: end of file
root@ubuntu:/hadoop/logs# 
{code}

It looks like it effectively fails when asked about a root zone, which is bad.

It's also kind of interesting in what it does and doesn't log. Probably should 
be configured to rotate logs based on size not date.

The real showstopper though:  RegistryDNS basically eats a core.  It is running 
with 100% cpu utilization with and without jsvc. On my laptop, this is 
triggering my fan.


was (Author: aw):
Anyway, rebuilt and it started up. Also it didn't switch to yarn this time 
without the env var set, so I'm not sure what was going on there.

In any case, I'm +1 on this particular patch. 

Now some general bad news, not related to this patch:

Ran a few queries, but this one is a bit concerning:

{code}
root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
;; Warning: query response not set

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; Query time: 0 msec
;; SERVER: 127.0.0.1#54(127.0.0.1)
;; WHEN: Thu Oct 12 16:04:54 PDT 2017
;; MSG SIZE  rcvd: 12

root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
;; Connection to ::1#54(::1) for . failed: connection refused.
;; communications error to 127.0.0.1#54: end of file
root@ubuntu:/hadoop/logs# 
{code}

It looks like it effectively fails when asked about a root zone, which is bad.

It's also kind of interesting in what it does and doesn't log. Probably should 
be configured to rotate logs based on size not date.

The real showstopper though:  RegistryDNS basically eats a core.  It is running 
with 100% cpu utilization with and without jsvc. On my laptop, this is 
triggering my fan.

> Add jsvc support for RegistryDNS
> 
>
> Key: YARN-7198
> URL: https://issues.apache.org/jira/browse/YARN-7198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7198-yarn-native-services.01.patch, 
> YARN-7198-yarn-native-services.02.patch, 
> YARN-7198-yarn-native-services.03.patch, 
> YARN-7198-yarn-native-services.04.patch, 
> YARN-7198-yarn-native-services.05.patch, 
> YARN-7198-yarn-native-services.06.patch
>
>
> RegistryDNS should have jsvc support and be managed through the shell 
> scripts, rather than being started manually. See original comments on 
> YARN-7191.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7198) Add jsvc support for RegistryDNS

2017-10-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202787#comment-16202787
 ] 

Allen Wittenauer commented on YARN-7198:


Anyway, rebuilt and it started up. Also it didn't switch to yarn this time 
without the env var set, so I'm not sure what was going on there.

In any case, I'm +1 on this particular patch. 

Now some general bad news, not related to this patch:

Ran a few queries, but this one is a bit concerning:

{code}
root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
;; Warning: query response not set

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
; (2 servers found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; Query time: 0 msec
;; SERVER: 127.0.0.1#54(127.0.0.1)
;; WHEN: Thu Oct 12 16:04:54 PDT 2017
;; MSG SIZE  rcvd: 12

root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
;; Connection to ::1#54(::1) for . failed: connection refused.
;; communications error to 127.0.0.1#54: end of file
root@ubuntu:/hadoop/logs# 
{code}

It looks like it effectively fails when asked about a root zone, which is bad.

It's also kind of interesting in what it does and doesn't log. Probably should 
be configured to rotate logs based on size not date.

The real showstopper though:  RegistryDNS basically eats a core.  It is running 
with 100% cpu utilization with and without jsvc. On my laptop, this is 
triggering my fan.

> Add jsvc support for RegistryDNS
> 
>
> Key: YARN-7198
> URL: https://issues.apache.org/jira/browse/YARN-7198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7198-yarn-native-services.01.patch, 
> YARN-7198-yarn-native-services.02.patch, 
> YARN-7198-yarn-native-services.03.patch, 
> YARN-7198-yarn-native-services.04.patch, 
> YARN-7198-yarn-native-services.05.patch, 
> YARN-7198-yarn-native-services.06.patch
>
>
> RegistryDNS should have jsvc support and be managed through the shell 
> scripts, rather than being started manually. See original comments on 
> YARN-7191.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7198) Add jsvc support for RegistryDNS

2017-10-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202732#comment-16202732
 ] 

Allen Wittenauer commented on YARN-7198:


I'm still playing with the last patch, but I'm very perplexed.

If I set

{code}
export YARN_REGISTRYDNS_SECURE_USER=yarn
{code}

in hadoop-env.sh/yarn-env.sh and then run:

{code}
yarn --daemon start registrydns
{code}

 the process breaks with 

{code}
java.lang.ClassNotFoundException: 
org.apache.hadoop.registry.server.dns.PrivilegedRegistryDNSStarter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at 
org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:151)
Cannot load daemon
Service exit with a return value of 3
{code}

That's indicative of either the classname being wrong or the jar files being or 
the class not being in the jar files or whatever.  A quick pass through the 
jars I'm using show it isn't in there.  I'll double check my build to make sure 
it's the correct one.  It's likely a local build problem, so whatever.

But if I don't set that (and therefore, don't get the jsvc behavior)

It comes up as yarn on port 54... which shouldn't work since 54 is a reserved 
port and the yarn user shouldn't have access to that port.  Very very curious.

> Add jsvc support for RegistryDNS
> 
>
> Key: YARN-7198
> URL: https://issues.apache.org/jira/browse/YARN-7198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7198-yarn-native-services.01.patch, 
> YARN-7198-yarn-native-services.02.patch, 
> YARN-7198-yarn-native-services.03.patch, 
> YARN-7198-yarn-native-services.04.patch, 
> YARN-7198-yarn-native-services.05.patch, 
> YARN-7198-yarn-native-services.06.patch
>
>
> RegistryDNS should have jsvc support and be managed through the shell 
> scripts, rather than being started manually. See original comments on 
> YARN-7191.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7170) Investigate bower dependencies for YARN UI v2

2017-10-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202671#comment-16202671
 ] 

Allen Wittenauer commented on YARN-7170:


Using HADOOP-14945 (which fixes docker -i mode when GPG signing isn't 
required), I ran two builds on ASF Jenkins, each using this command line:

{code}
dev-support/bin/create-release --docker --native --dockercache
{code}

Once with plain trunk+14945 an once with the -02 patch.  My understanding is 
that bower and friends cache in the home directory. By running each build in 
separate Docker containers with their own home dirs and their own maven repo 
caches, nothing should get cached between the two builds.

As a result, the -02 patch cuts build time by  ~3 minutes.  Of course, the ASF 
also has a significantly faster network pipe than if you were building at home. 
Additionally the node I was running on wasn't doing much during the first run 
but got another job scheduled during the second run.  As a result, times here 
should be viewed as conservative.

It'd be great if someone else can confirm that upgrading the frontend plugin 
has a significant impact on the build time.

> Investigate bower dependencies for YARN UI v2
> -
>
> Key: YARN-7170
> URL: https://issues.apache.org/jira/browse/YARN-7170
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: YARN-7170.001.patch, YARN-7170.002.patch
>
>
> [INFO] bower ember#2.2.0   progress Receiving
> objects:  50% (38449/75444), 722.46 MiB | 3.30 MiB/s
> ...
> [INFO] bower ember#2.2.0   progress Receiving
> objects:  99% (75017/75444), 1.56 GiB | 3.31 MiB/s
> Investigate the dependencies and reduce the download size and speed of 
> compilation.
> cc/ [~Sreenath] and [~akhilpb]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6608) Backport all SLS improvements from trunk to branch-2

2017-10-12 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202485#comment-16202485
 ] 

Allen Wittenauer commented on YARN-6608:


Umm, have you folks actually tried using those shell scripts? 

> Backport all SLS improvements from trunk to branch-2
> 
>
> Key: YARN-6608
> URL: https://issues.apache.org/jira/browse/YARN-6608
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-6608-branch-2.v0.patch, 
> YARN-6608-branch-2.v1.patch, YARN-6608-branch-2.v2.patch, 
> YARN-6608-branch-2.v3.patch, YARN-6608-branch-2.v4.patch, 
> YARN-6608-branch-2.v5.patch, YARN-6608-branch-2.v6.patch, 
> YARN-6608-branch-2.v7.patch
>
>
> The SLS has received lots of attention in trunk, but only some of it made it 
> back to branch-2. This patch is a "raw" fork-lift of the trunk development 
> from hadoop-tools/hadoop-sls.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7198) Add jsvc support for RegistryDNS

2017-10-11 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201450#comment-16201450
 ] 

Allen Wittenauer commented on YARN-7198:


I'm still fighting to get this running, but a few things already:

a) please link "YARN Registry" in the beginning of the document to the YARN 
registry documentation.
b) let's fix the YARN registry documentation to explicitly say that a separate 
zookeeper instance is required.  (or, if it's not, then something is missing in 
the docs there)
c) the zk quorum info in the registrydns docs contradict what is in the YARN 
registry documentation. this clearly needs to get rectified.

I'll play with this more tomorrow, since my calendar cleared up.

> Add jsvc support for RegistryDNS
> 
>
> Key: YARN-7198
> URL: https://issues.apache.org/jira/browse/YARN-7198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7198-yarn-native-services.01.patch, 
> YARN-7198-yarn-native-services.02.patch, 
> YARN-7198-yarn-native-services.03.patch, 
> YARN-7198-yarn-native-services.04.patch, 
> YARN-7198-yarn-native-services.05.patch
>
>
> RegistryDNS should have jsvc support and be managed through the shell 
> scripts, rather than being started manually. See original comments on 
> YARN-7191.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7170) Investigate bower dependencies for YARN UI v2

2017-10-11 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-7170:
---
Attachment: YARN-7170.002.patch

-02:
* upgrade frontend-maven-plugin
* also move it's version definition to the proper location in the maven repo

> Investigate bower dependencies for YARN UI v2
> -
>
> Key: YARN-7170
> URL: https://issues.apache.org/jira/browse/YARN-7170
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: YARN-7170.001.patch, YARN-7170.002.patch
>
>
> [INFO] bower ember#2.2.0   progress Receiving
> objects:  50% (38449/75444), 722.46 MiB | 3.30 MiB/s
> ...
> [INFO] bower ember#2.2.0   progress Receiving
> objects:  99% (75017/75444), 1.56 GiB | 3.31 MiB/s
> Investigate the dependencies and reduce the download size and speed of 
> compilation.
> cc/ [~Sreenath] and [~akhilpb]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7198) Add jsvc support for RegistryDNS

2017-10-10 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199576#comment-16199576
 ] 

Allen Wittenauer commented on YARN-7198:


bq.  but this is going too slow. 

A month is a relatively short time compared to other patches.

> Add jsvc support for RegistryDNS
> 
>
> Key: YARN-7198
> URL: https://issues.apache.org/jira/browse/YARN-7198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7198-yarn-native-services.01.patch, 
> YARN-7198-yarn-native-services.02.patch, 
> YARN-7198-yarn-native-services.03.patch, 
> YARN-7198-yarn-native-services.04.patch, 
> YARN-7198-yarn-native-services.05.patch
>
>
> RegistryDNS should have jsvc support and be managed through the shell 
> scripts, rather than being started manually. See original comments on 
> YARN-7191.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4006) YARN AltKerberos HTTP Authentication doesn't work

2017-10-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4006:
---
Summary: YARN AltKerberos HTTP Authentication doesn't work  (was: YARN 
Alternate Kerberos HTTP Authentication Changes)

> YARN AltKerberos HTTP Authentication doesn't work
> -
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
>Priority: Blocker
> Attachments: YARN-4006-branch-trunk.patch, 
> YARN-4006-branch2.6.0.patch, sample-ats-alt-auth.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
> {code}
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4006) YARN Alternate Kerberos HTTP Authentication Changes

2017-10-10 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199174#comment-16199174
 ] 

Allen Wittenauer commented on YARN-4006:


It's still very much broken for all YARN UIs (not just timeline server).  We 
have opted not to use the YARN UI on secure clusters.  

At one point, we were investigating HADOOP-12082 as a replacement for 
AltKerberos, but the lack of documentation is a major hindrance.


> YARN Alternate Kerberos HTTP Authentication Changes
> ---
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
>Priority: Blocker
> Attachments: YARN-4006-branch-trunk.patch, 
> YARN-4006-branch2.6.0.patch, sample-ats-alt-auth.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
> {code}
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4006) YARN Alternate Kerberos HTTP Authentication Changes

2017-10-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4006:
---
Summary: YARN Alternate Kerberos HTTP Authentication Changes  (was: YARN 
ATSv1 Alternate Kerberos HTTP Authentication Changes)

> YARN Alternate Kerberos HTTP Authentication Changes
> ---
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
>Priority: Blocker
> Attachments: YARN-4006-branch-trunk.patch, 
> YARN-4006-branch2.6.0.patch, sample-ats-alt-auth.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
> {code}
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4006) YARN Alternate Kerberos HTTP Authentication Changes

2017-10-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-4006:
---
Target Version/s:   (was: 2.9.0)

> YARN Alternate Kerberos HTTP Authentication Changes
> ---
>
> Key: YARN-4006
> URL: https://issues.apache.org/jira/browse/YARN-4006
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, timelineserver
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.5.1, 2.6.1, 2.8.0, 2.7.1, 2.7.2
>Reporter: Greg Senia
>Priority: Blocker
> Attachments: YARN-4006-branch-trunk.patch, 
> YARN-4006-branch2.6.0.patch, sample-ats-alt-auth.patch
>
>
> When attempting to use The Hadoop Alternate Authentication Classes. They do 
> not exactly work with what was built with YARN-1935.
> I went ahead and made the following changes to support using a Custom 
> AltKerberos DelegationToken custom class.
> Changes to: TimelineAuthenticationFilterInitializer.class
> {code}
>String authType = filterConfig.get(AuthenticationFilter.AUTH_TYPE);
> LOG.info("AuthType Configured: "+authType);
> if (authType.equals(PseudoAuthenticationHandler.TYPE)) {
>   filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   PseudoDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: PseudoDelegationTokenAuthenticationHandler");
> } else if (authType.equals(KerberosAuthenticationHandler.TYPE) || 
> (UserGroupInformation.isSecurityEnabled() && 
> conf.get("hadoop.security.authentication").equals(KerberosAuthenticationHandler.TYPE)))
>  {
>   if (!(authType.equals(KerberosAuthenticationHandler.TYPE))) {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   authType);
> LOG.info("AuthType: "+authType);
>   } else {
> filterConfig.put(AuthenticationFilter.AUTH_TYPE,
>   KerberosDelegationTokenAuthenticationHandler.class.getName());
> LOG.info("AuthType: KerberosDelegationTokenAuthenticationHandler");
>   } 
>   // Resolve _HOST into bind address
>   String bindAddress = conf.get(HttpServer2.BIND_ADDRESS);
>   String principal =
>   filterConfig.get(KerberosAuthenticationHandler.PRINCIPAL);
>   if (principal != null) {
> try {
>   principal = SecurityUtil.getServerPrincipal(principal, bindAddress);
> } catch (IOException ex) {
>   throw new RuntimeException(
>   "Could not resolve Kerberos principal name: " + ex.toString(), 
> ex);
> }
> filterConfig.put(KerberosAuthenticationHandler.PRINCIPAL,
> principal);
>   }
> }
>  {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7198) Add jsvc support for RegistryDNS

2017-10-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190753#comment-16190753
 ] 

Allen Wittenauer edited comment on YARN-7198 at 10/4/17 3:41 AM:
-

I tried to run this based upon the documentation here:

https://github.com/apache/hadoop/blob/63d1084e9781e0fee876916190b69f6242dd00e4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/RegistryDNS.md

Playing with this tonight, one thing has become really obvious:

I'm looking in a directory called 'yarn-service' at documentation.  If this is 
the 'yarn-service', then what is the rest of YARN?  service is really not a 
good word to use.  In fact, service is so over used in the YARN documentation 
(here and elsewhere) as a whole that it's lost all meaning. :(

Anyway

I tried to run this with these configs:

{code}
 
hadoop.registry.dns.enabled
true
  
  
hadoop.registry.dns.domain-name
hadoop.example.com
  
  
hadoop.registry.dns.bind-port
54
  
  
hadoop.registry.dns.zone-subnet
172.16.170.0
  
  
hadoop.registry.dns.zone-subnet
255.255.255.0
  
{code}

It's interesting that it failed to start because it couldn't connect to ZK on 
localhost/127.0.0.1:2181. So I'm guessing there is a chunk of documentation 
missing?

That said:

bq. hadoop.registry.dns.zone-subnet

How is this specified?  CIDR notation? 

bq. hadoop.registry.dns.zone-mask

OK, maybe not?

An example really needs to exist in the markdown docs for these two entries 
about what is actually wanted here.  For example, if I want 192.168.100.0/22 
(aka 192.168.100.0 to 192.168.103.255), what does the configuration look like?  
If it supports CIDR, then zone-mask is pointless.  But I'm guessing it only 
supports the classic class A, B, and C?

This is really important because reverse addressing for non-standard network 
blocks is a bit wacky to configure even on standard DNS servers.

Thanks.


was (Author: aw):

I tried to run this based upon the documentation here:

https://github.com/apache/hadoop/blob/63d1084e9781e0fee876916190b69f6242dd00e4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/RegistryDNS.md

Playing with this tonight, one thing has become really obvious:

I'm looking in a directory called 'yarn-service' at documentation.  If this is 
the 'yarn-service', then what is the rest of YARN?  service is really not a 
good word to use.  In fact, service is so over used in the YARN documentation 
(here and elsewhere) as a whole that it's lost all meaning. :(

Anyway

I tried to run this with these configs:

{code}
 
hadoop.registry.dns.enabled
true
  
  
hadoop.registry.dns.domain-name
hadoop.example.com
  
  
hadoop.registry.dns.bind-port
54
  
  
hadoop.registry.dns.zone-subnet
172.16.170.0
  
  
hadoop.registry.dns.zone-subnet
255.255.255.0
  
{code}

It's interesting that it failed to start because it couldn't connect to ZK on 
localhost/127.0.0.1:2181. So I'm guessing there is a chunk of documentation 
missing?

That said:

bq. hadoop.registry.dns.zone-subnet

How is this specified?  CIDR notation? 

bq. hadoop.registry.dns.zone-mask

OK, maybe not?

An example really needs to exist in the markdown docs for these two entries 
about what is actually wanted here.  For example, if I want 192.168.100.0/22 
(aka 192.168.100.0 to 192.168.103.255), what does the configuration look like?  
If it supports CIDR, then zone-mask is pointless.  But I'm guessing it only 
supports the classic class A, B, and C?

This is really important because reverse addressing is a bit wacky to configure 
even on standard DNS servers.

Thanks.

> Add jsvc support for RegistryDNS
> 
>
> Key: YARN-7198
> URL: https://issues.apache.org/jira/browse/YARN-7198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7198-yarn-native-services.01.patch, 
> YARN-7198-yarn-native-services.02.patch, 
> YARN-7198-yarn-native-services.03.patch, 
> YARN-7198-yarn-native-services.04.patch
>
>
> RegistryDNS should have jsvc support and be managed through the shell 
> scripts, rather than being started manually. See original comments on 
> YARN-7191.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7198) Add jsvc support for RegistryDNS

2017-10-03 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190753#comment-16190753
 ] 

Allen Wittenauer commented on YARN-7198:



I tried to run this based upon the documentation here:

https://github.com/apache/hadoop/blob/63d1084e9781e0fee876916190b69f6242dd00e4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/yarn-service/RegistryDNS.md

Playing with this tonight, one thing has become really obvious:

I'm looking in a directory called 'yarn-service' at documentation.  If this is 
the 'yarn-service', then what is the rest of YARN?  service is really not a 
good word to use.  In fact, service is so over used in the YARN documentation 
(here and elsewhere) as a whole that it's lost all meaning. :(

Anyway

I tried to run this with these configs:

{code}
 
hadoop.registry.dns.enabled
true
  
  
hadoop.registry.dns.domain-name
hadoop.example.com
  
  
hadoop.registry.dns.bind-port
54
  
  
hadoop.registry.dns.zone-subnet
172.16.170.0
  
  
hadoop.registry.dns.zone-subnet
255.255.255.0
  
{code}

It's interesting that it failed to start because it couldn't connect to ZK on 
localhost/127.0.0.1:2181. So I'm guessing there is a chunk of documentation 
missing?

That said:

bq. hadoop.registry.dns.zone-subnet

How is this specified?  CIDR notation? 

bq. hadoop.registry.dns.zone-mask

OK, maybe not?

An example really needs to exist in the markdown docs for these two entries 
about what is actually wanted here.  For example, if I want 192.168.100.0/22 
(aka 192.168.100.0 to 192.168.103.255), what does the configuration look like?  
If it supports CIDR, then zone-mask is pointless.  But I'm guessing it only 
supports the classic class A, B, and C?

This is really important because reverse addressing is a bit wacky to configure 
even on standard DNS servers.

Thanks.

> Add jsvc support for RegistryDNS
> 
>
> Key: YARN-7198
> URL: https://issues.apache.org/jira/browse/YARN-7198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7198-yarn-native-services.01.patch, 
> YARN-7198-yarn-native-services.02.patch, 
> YARN-7198-yarn-native-services.03.patch, 
> YARN-7198-yarn-native-services.04.patch
>
>
> RegistryDNS should have jsvc support and be managed through the shell 
> scripts, rather than being started manually. See original comments on 
> YARN-7191.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7207) Cache the local host name when getting application list in RM

2017-09-27 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183572#comment-16183572
 ] 

Allen Wittenauer commented on YARN-7207:


Actually, let me expand on that a bit, because we're running directly into 
"better practices" in a space that many may not understand the details.

A process requests a host resolution of a name/ip that is associated with the 
machine that the process is running on (localhost, whatever hostname() returns, 
etc, etc).  That resolution should be going through the local cache (nscd, 
sssd, lookupd, whatever).  That cache should be configured such that it 
resolves through files (e.g., /etc/hosts) and then through DNS.  /etc/hosts 
SHOULD have all known names and IPs for the local machine, eliminating the need 
for any DNS lookup. 

A misconfigured machine will either by not having a cache or having the cache 
misconfigured ask DNS or some other naming service first.  This will 
*definitely* impact system performance. But it's also a misconfiguration; this 
won't just impact YARN but pretty much every single process on the box.  Need 
to write to syslog?  Yup, gonna ask DNS

> Cache the local host name when getting application list in RM
> -
>
> Key: YARN-7207
> URL: https://issues.apache.org/jira/browse/YARN-7207
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: RM
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7207.001.patch, YARN-7207.002.patch
>
>
> {{getLocalHostName()}} is invoked for generating the report for each 
> application, which means it is called 1000 times for each 
> {{getApplications()}} if there are 1000 apps in RM. Some user got a 
> performance issue when {{getLocalHostName()}} is slow under some network envs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7207) Cache the local host name when getting application list in RM

2017-09-27 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183566#comment-16183566
 ] 

Allen Wittenauer commented on YARN-7207:


bq. Single call of getLocalHost is pretty slow due to some DNS issue

DNS calls for localhost shouldn't happen on a properly configured machine.

> Cache the local host name when getting application list in RM
> -
>
> Key: YARN-7207
> URL: https://issues.apache.org/jira/browse/YARN-7207
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: RM
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7207.001.patch, YARN-7207.002.patch
>
>
> {{getLocalHostName()}} is invoked for generating the report for each 
> application, which means it is called 1000 times for each 
> {{getApplications()}} if there are 1000 apps in RM. Some user got a 
> performance issue when {{getLocalHostName()}} is slow under some network envs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7207) Cache the local host name when getting application list in RM

2017-09-27 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183527#comment-16183527
 ] 

Allen Wittenauer commented on YARN-7207:


If resolving the local hostname is slow, then that's a symptom of a 
misconfigured host.  e.g., putting dns before files in nsswitch.  Are we 
actually helping the user by hiding it?

> Cache the local host name when getting application list in RM
> -
>
> Key: YARN-7207
> URL: https://issues.apache.org/jira/browse/YARN-7207
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: RM
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7207.001.patch, YARN-7207.002.patch
>
>
> {{getLocalHostName()}} is invoked for generating the report for each 
> application, which means it is called 1000 times for each 
> {{getApplications()}} if there are 1000 apps in RM. Some user got a 
> performance issue when {{getLocalHostName()}} is slow under some network envs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7198) Add jsvc support for RegistryDNS

2017-09-20 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174006#comment-16174006
 ] 

Allen Wittenauer commented on YARN-7198:


It's on my list.  I'm going to actually play with it rather than just look at 
it, so it's going to take me a bit.

> Add jsvc support for RegistryDNS
> 
>
> Key: YARN-7198
> URL: https://issues.apache.org/jira/browse/YARN-7198
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-7198-yarn-native-services.01.patch, 
> YARN-7198-yarn-native-services.02.patch
>
>
> RegistryDNS should have jsvc support and be managed through the shell 
> scripts, rather than being started manually. See original comments on 
> YARN-7191.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6550) Capture launch_container.sh logs

2017-09-20 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173980#comment-16173980
 ] 

Allen Wittenauer commented on YARN-6550:


FYI:

This:  

{code}
set -o pipefail -e
{code}
... should be at the top.  Additionally, if those sets are being done, then

{code}
hadoop_shell_errorcode=$?
if [[ "$hadoop_shell_errorcode" -ne 0 ]]
then
  exit $hadoop_shell_errorcode
fi
{code}

... should never be getting reached.

Additionally:

{code}
exec ...
{code}

replaces the running app, so that closing if/then clause is pointless.


> Capture launch_container.sh logs
> 
>
> Key: YARN-6550
> URL: https://issues.apache.org/jira/browse/YARN-6550
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-beta1
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
> Attachments: YARN-6550.002.patch, YARN-6550.003.patch, 
> YARN-6550.005.patch, YARN-6550.006.patch, YARN-6550.007.patch, 
> YARN-6550.008.patch, YARN-6550.009.patch, YARN-6550.010.patch, YARN-6550.patch
>
>
> launch_container.sh which generated by NM will do a bunch of things (like 
> create link, etc.) while launch a process. No logs captured until {{exec}} is 
> called. We need capture all failures of launch_container.sh for easier 
> troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7227) hadoop personality: yarn-ui should be conditional

2017-09-20 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created YARN-7227:
--

 Summary: hadoop personality: yarn-ui should be conditional
 Key: YARN-7227
 URL: https://issues.apache.org/jira/browse/YARN-7227
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer


Given how much stuff -Pyarn-ui downloads, we should make it conditional to cut 
down on testing time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6830) Support quoted strings for environment variables

2017-09-05 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153924#comment-16153924
 ] 

Allen Wittenauer commented on YARN-6830:


Rather than fight with a regex why not redefine the API instead?

bq. When running the MR job, these environment variables are supplied as a 
comma delimited string.
bq. -Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"

-Dmapreduce.map.env.MODE=bar
-Dmapreduce.map.env.IMAGE_NAME=foo
-Dmapreduce.map.env.MOUNTS=/tmp/foo,/tmp/bar

...

e.g, mapreduce.map.env.[foo]=bar  gets turned into foo=bar

This greatly simplifies the input validation needed and makes it clear what is 
actually being defined.

> Support quoted strings for environment variables
> 
>
> Key: YARN-6830
> URL: https://issues.apache.org/jira/browse/YARN-6830
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-6830.001.patch
>
>
> There are cases where it is necessary to allow for quoted string literals 
> within environment variables values when passed via the yarn command line 
> interface.
> For example, consider the follow environment variables for a MR map task.
> {{MODE=bar}}
> {{IMAGE_NAME=foo}}
> {{MOUNTS=/tmp/foo,/tmp/bar}}
> When running the MR job, these environment variables are supplied as a comma 
> delimited string.
> {{-Dmapreduce.map.env="MODE=bar,IMAGE_NAME=foo,MOUNTS=/tmp/foo,/tmp/bar"}}
> In this case, {{MOUNTS}} will be parsed and added to the task environment as 
> {{MOUNTS=/tmp/foo}}. Any attempts to quote the embedded comma separated value 
> results in quote characters becoming part of the value, and parsing still 
> breaks down at the comma.
> This issue is to allow for quoting the comma separated value (escaped double 
> or single quote). This was mentioned on YARN-4595 and will impact YARN-5534 
> as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6721) container-executor should have stack checking

2017-08-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149944#comment-16149944
 ] 

Allen Wittenauer commented on YARN-6721:


Thanks!

Committed to trunk!

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Critical
>  Labels: security
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6721.00.patch, YARN-6721.01.patch, 
> YARN-6721.02.patch
>
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6721) container-executor should have stack checking

2017-08-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149922#comment-16149922
 ] 

Allen Wittenauer commented on YARN-6721:


OK, yeah, this one is working on Linux and OS X with both gcc and clang, from 
what I've seen. :)

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Critical
>  Labels: security
> Attachments: YARN-6721.00.patch, YARN-6721.01.patch, 
> YARN-6721.02.patch
>
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6721) container-executor should have stack checking

2017-08-31 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6721:
---
Attachment: YARN-6721.02.patch

-02:
* fix GNU C++ flags

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Critical
>  Labels: security
> Attachments: YARN-6721.00.patch, YARN-6721.01.patch, 
> YARN-6721.02.patch
>
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6721) container-executor should have stack checking

2017-08-31 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6721:
---
Attachment: YARN-6721.01.patch

-01:
* fix up some clang issues
* add -pthread in a more appropriate manner
* gtest needs the flags too

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Critical
>  Labels: security
> Attachments: YARN-6721.00.patch, YARN-6721.01.patch
>
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6721) container-executor should have stack checking

2017-08-31 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149461#comment-16149461
 ] 

Allen Wittenauer commented on YARN-6721:


Thanks, but after a bit more testing, clang 4.0 on Linux is blowing up.  Looks 
like a trivial fix though.

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Critical
>  Labels: security
> Attachments: YARN-6721.00.patch
>
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6721) container-executor should have stack checking

2017-08-30 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6721:
---
Priority: Critical  (was: Major)

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Critical
>  Labels: security
> Attachments: YARN-6721.00.patch
>
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6721) container-executor should have stack checking

2017-08-30 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6721:
---
Attachment: YARN-6721.00.patch

-00:
* support for gcc, clang, and Sun
* expects HADOOP-14670 to be applied first


> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: security
> Attachments: YARN-6721.00.patch
>
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6721) container-executor should have stack checking

2017-08-29 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6721:
---
Target Version/s: 3.0.0-beta1

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: security
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6721) container-executor should have stack checking

2017-08-29 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reassigned YARN-6721:
--

Assignee: Allen Wittenauer  (was: Sunil G)

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>  Labels: security
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6721) container-executor should have stack checking

2017-08-29 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6721:
---
Target Version/s:   (was: 2.7.5)

> container-executor should have stack checking
> -
>
> Key: YARN-6721
> URL: https://issues.apache.org/jira/browse/YARN-6721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, security
>Reporter: Allen Wittenauer
>Assignee: Sunil G
>  Labels: security
>
> As per https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt and 
> given that container-executor is setuid, it should be compiled with stack 
> checking if the compiler supports such features.  (-fstack-check on gcc, 
> -fsanitize=safe-stack on clang, -xcheck=stkovf on "Oracle Solaris Studio", 
> others as we find them, ...)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6798) Fix NM startup failure with old state store due to version mismatch

2017-08-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6798:
---
Release Note: 


This fixes the LevelDB state store for the NodeManager.  As of this patch, the 
state store versions now correspond to the following table.

* Previous Patch: YARN-5049
  * LevelDB Key: queued
  * Hadoop Versions: 2.9.0, 3.0.0-alpha1
  * Corresponding LevelDB Version: 1.2
* Previous Patch: YARN-6127
  * LevelDB Key: AMRMProxy/NextMasterKey
  * Hadoop Versions: 2.9.0, 3.0.0-alpha4
  * Corresponding LevelDB Version: 1.1

  was:
This fixes the LevelDB state store for the NodeManager.  As of this patch, the 
state store versions now correspond to the following table.

- Previous Patch: YARN-5049
-- LevelDB Key: queued
-- Hadoop Versions: 2.9.0, 3.0.0-alpha1
-- Corresponding LevelDB Version: 1.2
- Previous Patch: YARN-6127
-- LevelDB Key: AMRMProxy/NextMasterKey
-- Hadoop Versions: 2.9.0, 3.0.0-alpha4
-- Corresponding LevelDB Version: 1.1


> Fix NM startup failure with old state store due to version mismatch
> ---
>
> Key: YARN-6798
> URL: https://issues.apache.org/jira/browse/YARN-6798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Botong Huang
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6798.v1.patch, YARN-6798.v2.patch
>
>
> YARN-6703 rolled back the state store version number for the RM from 2.0 to 
> 1.4.
> YARN-6127 bumped the version for the NM to 3.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(3, 0);
> YARN-5049 bumped the version for the NM to 2.0
> private static final Version CURRENT_VERSION_INFO = 
> Version.newInstance(2, 0);
> During an upgrade, all NMs died after upgrading a C6 cluster from alpha2 to 
> alpha4.
> {noformat}
> 2017-07-07 15:48:17,259 FATAL 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting 
> NodeManager
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Incompatible version for NM state: expecting NM state version 3.0, but 
> loading version 2.0
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:246)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:748)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:809)
> Caused by: java.io.IOException: Incompatible version for NM state: expecting 
> NM state version 3.0, but loading version 2.0
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.checkVersion(NMLeveldbStateStoreService.java:1454)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1308)
> at 
> org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:307)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> ... 5 more
> 2017-07-07 15:48:17,277 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NodeManager at xxx.gce.cloudera.com/aa.bb.cc.dd
> /
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6845) Variable scheduler of FSLeafQueue duplicates the one of its parent FSQueue.

2017-08-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6845:
---
Fix Version/s: (was: 2.9)
   2.9.0

> Variable scheduler of FSLeafQueue duplicates the one of its parent FSQueue.
> ---
>
> Key: YARN-6845
> URL: https://issues.apache.org/jira/browse/YARN-6845
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Trivial
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6845.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6307) Refactor FairShareComparator#compare

2017-08-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6307:
---
Fix Version/s: (was: 2.9)
   2.9.0

> Refactor FairShareComparator#compare
> 
>
> Key: YARN-6307
> URL: https://issues.apache.org/jira/browse/YARN-6307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6307.001.patch, YARN-6307.002.patch, 
> YARN-6307.003.patch
>
>
> The method does three things: compare the min share usage, compare fair share 
> usage by checking weight ratio, break tied by submit time and name. They are 
> mixed with each other which is not easy to read and maintenance, poor style. 
> Additionally, there are potential performance issues, like no need to check 
> weight ratio if minShare usage comparison already indicate the order. It is 
> worth to improve considering huge amount invokings in scheduler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6999) Add log about how to solve Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2017-08-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6999:
---
Fix Version/s: (was: 2.9)
   2.9.0

> Add log about how to solve Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> --
>
> Key: YARN-6999
> URL: https://issues.apache.org/jira/browse/YARN-6999
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: documentation, security
>Affects Versions: 3.0.0-beta1
> Environment: All operating systems.
>Reporter: Linlin Zhou
>Assignee: Linlin Zhou
>Priority: Minor
>  Labels: newbie
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: yarn-6999.002.patch, yarn-6999.003.patch, yarn-6999.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According Setting up a Single Node Cluster 
> [https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html],
>  we would still failed to run the MapReduce job example. Due to a security 
> fix, yarn use user's environment variables to init, and user's environment 
> variable usually doesn't include MapReduce related settings. So we need to 
> add the related config in etc/hadoop/mapred-site.xml manually. Currently the 
> log only tells there is an Error:
> Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster, without suggestion on how to 
> solve it. I want to add the useful suggestion in log.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6778) In ResourceWeights, weights and setWeights() should be final

2017-08-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6778:
---
Fix Version/s: (was: 2.9)
   2.9.0

> In ResourceWeights, weights and setWeights() should be final
> 
>
> Key: YARN-6778
> URL: https://issues.apache.org/jira/browse/YARN-6778
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.1, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
>  Labels: newbie
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6778.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6959:
---
Target Version/s: 3.0.0-alpha4, 2.7.1, 2.8.0  (was: 2.8.0, 2.7.1, 
3.0.0-alpha4)
Release Note: ResourceManager will now record ResourceRequests from 
different attempts into different objects.

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959-branch-2.8.002.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6709) Root privilege escalation in experimental Docker support

2017-08-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6709:
---
Release Note: CVE-2017-7669 / YARN's Docker support did not do enough input 
validation.  This allowed a root level escalation from an ordinary user 
account. 

> Root privilege escalation in experimental Docker support
> 
>
> Key: YARN-6709
> URL: https://issues.apache.org/jira/browse/YARN-6709
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, security
>Affects Versions: 2.8.0, 3.0.0-alpha1, 3.0.0-alpha2
>Reporter: Allen Wittenauer
>Assignee: Varun Vasudev
>Priority: Blocker
>  Labels: security
> Fix For: 2.8.1, 3.0.0-alpha3
>
>
> YARN-3853 and friends do not do enough input validation. They allow a user to 
> do escalate privileges at root trivially. See 
> https://effectivemachines.com/2017/06/02/docker-security-in-framework-managed-multi-user-environments/
>  for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5697) Use CliParser to parse options in RMAdminCLI

2017-08-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-5697:
---
Component/s: resourcemanager

> Use CliParser to parse options in RMAdminCLI
> 
>
> Key: YARN-5697
> URL: https://issues.apache.org/jira/browse/YARN-5697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Tao Jie
>Assignee: Tao Jie
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5697.001.patch, YARN-5697.002.patch, 
> YARN-5697.003.patch, YARN-5697.004.patch, YARN-5697.005-branch-2.8.patch, 
> YARN-5697.005.patch
>
>
> As discussed in YARN-4855, it is better to use CliParser rather than args to 
> parse command line options in RMAdminCli.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5697) Use CliParser to parse options in RMAdminCLI

2017-08-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reassigned YARN-5697:
--

Assignee: Tao Jie  (was: Jonathan Hung)

> Use CliParser to parse options in RMAdminCLI
> 
>
> Key: YARN-5697
> URL: https://issues.apache.org/jira/browse/YARN-5697
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Tao Jie
>Assignee: Tao Jie
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: YARN-5697.001.patch, YARN-5697.002.patch, 
> YARN-5697.003.patch, YARN-5697.004.patch, YARN-5697.005-branch-2.8.patch, 
> YARN-5697.005.patch
>
>
> As discussed in YARN-4855, it is better to use CliParser rather than args to 
> parse command line options in RMAdminCli.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7011) yarn-daemon.sh is not respecting --config option

2017-08-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-7011.

Resolution: Not A Problem

Closing this again given the behavior is actually the same given the same 
configs.

> yarn-daemon.sh is not respecting --config option
> 
>
> Key: YARN-7011
> URL: https://issues.apache.org/jira/browse/YARN-7011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Priority: Trivial
>
> Steps to reproduce:
> 1. Copy the conf to a temporary location /tmp/Conf
> 2. Modify anything in yarn-site.xml under /tmp/Conf/. Ex: Give invalid RM 
> address
> 3. Restart the resourcemanager using yarn-daemon.sh using --config /tmp/Conf
> 4. --config is not respected as the changes made in /tmp/Conf/yarn-site.xml 
> is not taken in while restarting RM



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7011) yarn-daemon.sh is not respecting --config option

2017-08-18 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-7011:
---
Priority: Trivial  (was: Blocker)

> yarn-daemon.sh is not respecting --config option
> 
>
> Key: YARN-7011
> URL: https://issues.apache.org/jira/browse/YARN-7011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Priority: Trivial
>
> Steps to reproduce:
> 1. Copy the conf to a temporary location /tmp/Conf
> 2. Modify anything in yarn-site.xml under /tmp/Conf/. Ex: Give invalid RM 
> address
> 3. Restart the resourcemanager using yarn-daemon.sh using --config /tmp/Conf
> 4. --config is not respected as the changes made in /tmp/Conf/yarn-site.xml 
> is not taken in while restarting RM



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7011) yarn-daemon.sh is not respecting --config option

2017-08-17 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130924#comment-16130924
 ] 

Allen Wittenauer commented on YARN-7011:


bq. Is that example different than when --config is used to point to that exact 
same misconfigured directory? Yes. branch-2's code decided that it was going to 
ignore what was in the configuration directory as told to us by the user.


Actually, I'm wrong.  branch-2 does exactly what trunk does:

{code}
hadoop-2.8.1 aw$ grep HADOOP_CONF_DIR etc/hadoop/hadoop-env.sh
export HADOOP_CONF_DIR=/bin
hadoop-2.8.1 aw$ bin/hadoop --config etc/hadoop classpath
/bin:/private/tmp/hadoop-2.8.1/share/hadoop/common/lib/*: 
...
{code}

So the behavior is exactly the same...

> yarn-daemon.sh is not respecting --config option
> 
>
> Key: YARN-7011
> URL: https://issues.apache.org/jira/browse/YARN-7011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Priority: Blocker
>
> Steps to reproduce:
> 1. Copy the conf to a temporary location /tmp/Conf
> 2. Modify anything in yarn-site.xml under /tmp/Conf/. Ex: Give invalid RM 
> address
> 3. Restart the resourcemanager using yarn-daemon.sh using --config /tmp/Conf
> 4. --config is not respected as the changes made in /tmp/Conf/yarn-site.xml 
> is not taken in while restarting RM



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7011) yarn-daemon.sh is not respecting --config option

2017-08-17 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130744#comment-16130744
 ] 

Allen Wittenauer edited comment on YARN-7011 at 8/17/17 4:13 PM:
-

bq. I am not sure if everyone would agree that it is by design at least I am 
not convinced yet.

As the person who designed and wrote the code in trunk, yes, it is working as 
intended.  

The problem is one of consistency.  There are a ton of different ways that 
HADOOP_CONF_DIR can get set:  external vs. internal, --config vs. no option on 
CLI, HADOOP_CONF_DIR manual vs. auto-discovered (2-3 ways), present in 
hadoop-env.sh vs. not, present but correct vs. present and incorrect, etc ...  
branch-2's behavior when going through this is matrix is wildly unpredictable.  
(and let's ignore's YARN_CONF_DIR because it just makes things worse and gets 
us too far into the weeds)

Here's a fun experiment to show my point.  Unpack a new 2.8.1 tar ball.  Set 
HADOOP_CONF_DIR in hadoop-2.8.1/etc/hadoop/hadoop-env.sh to something incorrect 
like /bin.  Now run hadoop classpath.

You'll find it starts out as "/bin".

Is that an incorrect behavior? 

On the one hand: yes, HADOOP_CONF_DIR is pointed to a wrong location.  But on 
the other:  no, the user explicitly told the system that it wanted /bin as the 
configuration directory.  branch-2 decided that in this instance /bin was 
correct and went forward.

Is that example different than when --config is used to point to that exact 
same misconfigured directory? Yes. branch-2's code decided that it was going to 
ignore what was in the configuration directory as told to us by the user.

Should that behavior be different?

My argument is no:  the system should not suddenly decide the user was 
incorrect in one instance vs. another. It absolutely must act predictably.  The 
user is expressly passing us a configuration directory that tells us our 
configuration is actually in a different directory--very much a 
misconfiguration.  The system accepts that as truth and moves on.

It's also worthwhile pointing out that 3.x adds a powerful wrinkle here:  
.hadoop-env allows the user to specify a different HADOOP_CONF_DIR, overwriting 
the installed one.  Here, redirection is key.

If you don't like the current behavior, there are only two ways out.  (Let's be 
clear: I'll reject any patch that continues branch-2's inconsistent behavior.)

* Option 1: always ignore HADOOP_CONF_DIR when set in hadoop-env.sh

If the code is reading hadoop-env.sh, then HADOOP_CONF_DIR is superfluous. It's 
only used at that point is to give a different set of directories to act as 
replacements for the files present in this one.

* Option 2: throw a warning

After reading hadoop-env.sh, detect if the value of HADOOP_CONF_DIR changed 
(after calculating the full path to deal with symbolic links, etc.).  If it 
did, throw a warning up and then either accept the new value (current behavior) 
or ignore it (see option 1).


was (Author: aw):
bq. I am not sure if everyone would agree that it is by design at least I am 
not convinced yet.

As the person who designed and wrote the code in trunk, yes, it is working as 
intended.  

The problem is one of consistency.  There are a ton of different ways that 
HADOOP_CONF_DIR can get set:  external vs. internal, --config vs. no option on 
CLI, HADOOP_CONF_DIR manual vs. auto-discovered (2-3 ways), present in 
hadoop-env.sh vs. not, present but correct vs. present and incorrect, etc ...  
branch-2's behavior when going through this is matrix is wildly unpredictable.  
(and let's ignore's YARN_CONF_DIR because it just makes things worse and gets 
us too far into the weeds)

Here's a fun experiment to show my point.  Unpack a new 2.8.1 tar ball.  Set 
HADOOP_CONF_DIR in hadoop-2.8.1/etc/hadoop/hadoop-env.sh to something incorrect 
like /bin.  Now run hadoop classpath.

You'll find it starts out as "/bin".

Is that an incorrect behavior? 

On the one hand: yes, HADOOP_CONF_DIR is pointed to a wrong location.  But on 
the other:  no, the user explicitly told the system that it wanted /bin as the 
configuration directory.  branch-2 decided that in this instance /bin was 
correct and went forward.

Is that example different than when --config is used to point to a 
misconfigured directory? Yes. branch-2's code decided that it was going to 
ignore what was in the configuration directory as told to us by the user.

Should that behavior be different?

My argument is no:  the system should not suddenly decide the user was 
incorrect in one instance vs. another. It absolutely must act predictably.  The 
user is expressly passing us a configuration directory that tells us our 
configuration is actually in a different directory--very much a 
misconfiguration.  The system accepts that as truth and moves on.

It's also worthwhile pointing out that 3.x adds a powerful wrinkle here:  

[jira] [Commented] (YARN-7011) yarn-daemon.sh is not respecting --config option

2017-08-17 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130744#comment-16130744
 ] 

Allen Wittenauer commented on YARN-7011:


bq. I am not sure if everyone would agree that it is by design at least I am 
not convinced yet.

As the person who designed and wrote the code in trunk, yes, it is working as 
intended.  

The problem is one of consistency.  There are a ton of different ways that 
HADOOP_CONF_DIR can get set:  external vs. internal, --config vs. no option on 
CLI, HADOOP_CONF_DIR manual vs. auto-discovered (2-3 ways), present in 
hadoop-env.sh vs. not, present but correct vs. present and incorrect, etc ...  
branch-2's behavior when going through this is matrix is wildly unpredictable.  
(and let's ignore's YARN_CONF_DIR because it just makes things worse and gets 
us too far into the weeds)

Here's a fun experiment to show my point.  Unpack a new 2.8.1 tar ball.  Set 
HADOOP_CONF_DIR in hadoop-2.8.1/etc/hadoop/hadoop-env.sh to something incorrect 
like /bin.  Now run hadoop classpath.

You'll find it starts out as "/bin".

Is that an incorrect behavior? 

On the one hand: yes, HADOOP_CONF_DIR is pointed to a wrong location.  But on 
the other:  no, the user explicitly told the system that it wanted /bin as the 
configuration directory.  branch-2 decided that in this instance /bin was 
correct and went forward.

Is that example different than when --config is used to point to a 
misconfigured directory? Yes. branch-2's code decided that it was going to 
ignore what was in the configuration directory as told to us by the user.

Should that behavior be different?

My argument is no:  the system should not suddenly decide the user was 
incorrect in one instance vs. another. It absolutely must act predictably.  The 
user is expressly passing us a configuration directory that tells us our 
configuration is actually in a different directory--very much a 
misconfiguration.  The system accepts that as truth and moves on.

It's also worthwhile pointing out that 3.x adds a powerful wrinkle here:  
.hadoop-env allows the user to specify a different HADOOP_CONF_DIR, overwriting 
the installed one.  Here, redirection is key.

If you don't like the current behavior, there are only two ways out.  (Let's be 
clear: I'll reject any patch that continues branch-2's inconsistent behavior.)

* Option 1: always ignore HADOOP_CONF_DIR when set in hadoop-env.sh

If the code is reading hadoop-env.sh, then HADOOP_CONF_DIR is superfluous. It's 
only used at that point is to give a different set of directories to act as 
replacements for the files present in this one.

* Option 2: throw a warning

After reading hadoop-env.sh, detect if the value of HADOOP_CONF_DIR changed 
(after calculating the full path to deal with symbolic links, etc.).  If it 
did, throw a warning up and then either accept the new value (current behavior) 
or ignore it (see option 1).

> yarn-daemon.sh is not respecting --config option
> 
>
> Key: YARN-7011
> URL: https://issues.apache.org/jira/browse/YARN-7011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Priority: Blocker
>
> Steps to reproduce:
> 1. Copy the conf to a temporary location /tmp/Conf
> 2. Modify anything in yarn-site.xml under /tmp/Conf/. Ex: Give invalid RM 
> address
> 3. Restart the resourcemanager using yarn-daemon.sh using --config /tmp/Conf
> 4. --config is not respected as the changes made in /tmp/Conf/yarn-site.xml 
> is not taken in while restarting RM



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7011) yarn-daemon.sh is not respecting --config option

2017-08-16 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129796#comment-16129796
 ] 

Allen Wittenauer commented on YARN-7011:


I have, however, opened HADOOP-14781 to signify that we should clarify the 
documentation around this.  (Because setting HADOOP_CONF_DIR in hadoop-env.sh 
isn't productive in any version of Hadoop.)

> yarn-daemon.sh is not respecting --config option
> 
>
> Key: YARN-7011
> URL: https://issues.apache.org/jira/browse/YARN-7011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Priority: Trivial
>
> Steps to reproduce:
> 1. Copy the conf to a temporary location /tmp/Conf
> 2. Modify anything in yarn-site.xml under /tmp/Conf/. Ex: Give invalid RM 
> address
> 3. Restart the resourcemanager using yarn-daemon.sh using --config /tmp/Conf
> 4. --config is not respected as the changes made in /tmp/Conf/yarn-site.xml 
> is not taken in while restarting RM



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7011) yarn-daemon.sh is not respecting --config option

2017-08-16 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-7011.

Resolution: Won't Fix

Closing as not a bug. Working as designed.

> yarn-daemon.sh is not respecting --config option
> 
>
> Key: YARN-7011
> URL: https://issues.apache.org/jira/browse/YARN-7011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Priority: Trivial
>
> Steps to reproduce:
> 1. Copy the conf to a temporary location /tmp/Conf
> 2. Modify anything in yarn-site.xml under /tmp/Conf/. Ex: Give invalid RM 
> address
> 3. Restart the resourcemanager using yarn-daemon.sh using --config /tmp/Conf
> 4. --config is not respected as the changes made in /tmp/Conf/yarn-site.xml 
> is not taken in while restarting RM



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7011) yarn-daemon.sh is not respecting --config option

2017-08-16 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-7011:
---
Priority: Trivial  (was: Blocker)

> yarn-daemon.sh is not respecting --config option
> 
>
> Key: YARN-7011
> URL: https://issues.apache.org/jira/browse/YARN-7011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Priority: Trivial
>
> Steps to reproduce:
> 1. Copy the conf to a temporary location /tmp/Conf
> 2. Modify anything in yarn-site.xml under /tmp/Conf/. Ex: Give invalid RM 
> address
> 3. Restart the resourcemanager using yarn-daemon.sh using --config /tmp/Conf
> 4. --config is not respected as the changes made in /tmp/Conf/yarn-site.xml 
> is not taken in while restarting RM



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7011) yarn-daemon.sh is not respecting --config option

2017-08-16 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129777#comment-16129777
 ] 

Allen Wittenauer commented on YARN-7011:


It's not a regression since the patch that changed the behavior was 
specifically marked as an incompatible change. 

> yarn-daemon.sh is not respecting --config option
> 
>
> Key: YARN-7011
> URL: https://issues.apache.org/jira/browse/YARN-7011
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Priority: Blocker
>
> Steps to reproduce:
> 1. Copy the conf to a temporary location /tmp/Conf
> 2. Modify anything in yarn-site.xml under /tmp/Conf/. Ex: Give invalid RM 
> address
> 3. Restart the resourcemanager using yarn-daemon.sh using --config /tmp/Conf
> 4. --config is not respected as the changes made in /tmp/Conf/yarn-site.xml 
> is not taken in while restarting RM



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >