[jira] [Updated] (YARN-10444) use openFile() with sequential IO for localizing files.

2020-09-18 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-10444:
--
Issue Type: Improvement  (was: Bug)

> use openFile() with sequential IO for localizing files.
> ---
>
> Key: YARN-10444
> URL: https://issues.apache.org/jira/browse/YARN-10444
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> HADOOP-16202 adds standard options for declaring the read/seek
> Policy when reading a file. These should be set to sequential IO
> When localising resources, so that if the default/cluster settings
> For a file system are optimized for random IO, artifact downloads
> are still read at the maximum speed possible (one big GET to the EOF).
> Most of this happens in hadoop-common, but some tuning of FSDownload
> can assist
> * tar/jar download must also be sequential
> * if the FileStatus is passed around, that can be used
>   in the open request to skip checks when loading the file.
>   
> Together this can save 3 HEAD requests per resource, with the sequential
> IO avoiding any splitting of the big read into separate block GETs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10444) use openFile() with sequential IO for localizing files.

2020-09-18 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198494#comment-17198494
 ] 

Steve Loughran commented on YARN-10444:
---

linking to work in https://github.com/apache/hadoop/pull/2168



> use openFile() with sequential IO for localizing files.
> ---
>
> Key: YARN-10444
> URL: https://issues.apache.org/jira/browse/YARN-10444
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> HADOOP-16202 adds standard options for declaring the read/seek
> Policy when reading a file. These should be set to sequential IO
> When localising resources, so that if the default/cluster settings
> For a file system are optimized for random IO, artifact downloads
> are still read at the maximum speed possible (one big GET to the EOF).
> Most of this happens in hadoop-common, but some tuning of FSDownload
> can assist
> * tar/jar download must also be sequential
> * if the FileStatus is passed around, that can be used
>   in the open request to skip checks when loading the file.
>   
> Together this can save 3 HEAD requests per resource, with the sequential
> IO avoiding any splitting of the big read into separate block GETs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10444) use openFile() with sequential IO for localizing files.

2020-09-18 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198492#comment-17198492
 ] 

Steve Loughran commented on YARN-10444:
---

The openFile() API is in hadoop 3.3.0, but the standard seek option key/values 
are only now going in. These changes will be part of the main patchthe YARN 
JIRA is there for completeness/awareness

> use openFile() with sequential IO for localizing files.
> ---
>
> Key: YARN-10444
> URL: https://issues.apache.org/jira/browse/YARN-10444
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> HADOOP-16202 adds standard options for declaring the read/seek
> Policy when reading a file. These should be set to sequential IO
> When localising resources, so that if the default/cluster settings
> For a file system are optimized for random IO, artifact downloads
> are still read at the maximum speed possible (one big GET to the EOF).
> Most of this happens in hadoop-common, but some tuning of FSDownload
> can assist
> * tar/jar download must also be sequential
> * if the FileStatus is passed around, that can be used
>   in the open request to skip checks when loading the file.
>   
> Together this can save 3 HEAD requests per resource, with the sequential
> IO avoiding any splitting of the big read into separate block GETs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10444) use openFile() with sequential IO for localizing files.

2020-09-18 Thread Steve Loughran (Jira)
Steve Loughran created YARN-10444:
-

 Summary: use openFile() with sequential IO for localizing files.
 Key: YARN-10444
 URL: https://issues.apache.org/jira/browse/YARN-10444
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment:  echo set the office to blue scene
Reporter: Steve Loughran
Assignee: Steve Loughran



HADOOP-16202 adds standard options for declaring the read/seek
Policy when reading a file. These should be set to sequential IO
When localising resources, so that if the default/cluster settings
For a file system are optimized for random IO, artifact downloads
are still read at the maximum speed possible (one big GET to the EOF).

Most of this happens in hadoop-common, but some tuning of FSDownload
can assist

* tar/jar download must also be sequential
* if the FileStatus is passed around, that can be used
  in the open request to skip checks when loading the file.
  
Together this can save 3 HEAD requests per resource, with the sequential
IO avoiding any splitting of the big read into separate block GETs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10444) use openFile() with sequential IO for localizing files.

2020-09-18 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-10444:
--
Component/s: nodemanager

> use openFile() with sequential IO for localizing files.
> ---
>
> Key: YARN-10444
> URL: https://issues.apache.org/jira/browse/YARN-10444
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.3.0
> Environment:  echo set the office to blue scene
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> HADOOP-16202 adds standard options for declaring the read/seek
> Policy when reading a file. These should be set to sequential IO
> When localising resources, so that if the default/cluster settings
> For a file system are optimized for random IO, artifact downloads
> are still read at the maximum speed possible (one big GET to the EOF).
> Most of this happens in hadoop-common, but some tuning of FSDownload
> can assist
> * tar/jar download must also be sequential
> * if the FileStatus is passed around, that can be used
>   in the open request to skip checks when loading the file.
>   
> Together this can save 3 HEAD requests per resource, with the sequential
> IO avoiding any splitting of the big read into separate block GETs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10444) use openFile() with sequential IO for localizing files.

2020-09-18 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-10444:
--
Priority: Minor  (was: Major)

> use openFile() with sequential IO for localizing files.
> ---
>
> Key: YARN-10444
> URL: https://issues.apache.org/jira/browse/YARN-10444
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
> Environment:  echo set the office to blue scene
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> HADOOP-16202 adds standard options for declaring the read/seek
> Policy when reading a file. These should be set to sequential IO
> When localising resources, so that if the default/cluster settings
> For a file system are optimized for random IO, artifact downloads
> are still read at the maximum speed possible (one big GET to the EOF).
> Most of this happens in hadoop-common, but some tuning of FSDownload
> can assist
> * tar/jar download must also be sequential
> * if the FileStatus is passed around, that can be used
>   in the open request to skip checks when loading the file.
>   
> Together this can save 3 HEAD requests per resource, with the sequential
> IO avoiding any splitting of the big read into separate block GETs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10444) use openFile() with sequential IO for localizing files.

2020-09-18 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-10444:
--
Environment: (was:  echo set the office to blue scene)

> use openFile() with sequential IO for localizing files.
> ---
>
> Key: YARN-10444
> URL: https://issues.apache.org/jira/browse/YARN-10444
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> HADOOP-16202 adds standard options for declaring the read/seek
> Policy when reading a file. These should be set to sequential IO
> When localising resources, so that if the default/cluster settings
> For a file system are optimized for random IO, artifact downloads
> are still read at the maximum speed possible (one big GET to the EOF).
> Most of this happens in hadoop-common, but some tuning of FSDownload
> can assist
> * tar/jar download must also be sequential
> * if the FileStatus is passed around, that can be used
>   in the open request to skip checks when loading the file.
>   
> Together this can save 3 HEAD requests per resource, with the sequential
> IO avoiding any splitting of the big read into separate block GETs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-20 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181268#comment-17181268
 ] 

Steve Loughran commented on YARN-10396:
---

thanks. If its just a test case, that's easier to fix

> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10396.001.patch, YARN-10396.002.patch, 
> YARN-10396.003.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10396) Max applications calculation per queue disregards queue level settings in absolute mode

2020-08-20 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181187#comment-17181187
 ] 

Steve Loughran commented on YARN-10396:
---

[~sunilg] This stops branch-3.2 (and presumably 3.1) from building.

Do you want revert/fix or shall I just revert for now? 

And can fixes go through a Yetus build/test run next time, or at least a local 
clean compile before the commit. Thanks



> Max applications calculation per queue disregards queue level settings in 
> absolute mode
> ---
>
> Key: YARN-10396
> URL: https://issues.apache.org/jira/browse/YARN-10396
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: YARN-10396.001.patch, YARN-10396.002.patch, 
> YARN-10396.003.patch
>
>
> Looking at the following code in 
> {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.java#L1126}}
> {code:java}
> int maxApplications = (int) (conf.getMaximumSystemApplications()
> * childQueue.getQueueCapacities().getAbsoluteCapacity(label));
> leafQueue.setMaxApplications(maxApplications);{code}
> In Absolute Resources mode setting the number of maximum applications on 
> queue level gets overridden with the system level setting scaled down to the 
> available resources. This means that the only way to set the maximum number 
> of applications is to change the queue's resource pool. This line should 
> consider the queue's 
> {{yarn.scheduler.capacity.\{queuepath}.maximum-applications }}setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10382) Non-secure yarn access secure hdfs

2020-08-10 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17174253#comment-17174253
 ] 

Steve Loughran commented on YARN-10382:
---

Problem there is that the code wants to know who the YARN principal of the 
resource manager is so that it can send messages to HDFS saying "renew these 
delegation tokens". Your insecure YARN RM doesn't have a kerberos principal, so 
secure HDFS will not issue delegation tokens to it. You could somehow cheat the 
configs to name some kerberos principal (yourself?) as the RM principal -no 
idea what happens then.

I would personally like YARN To collect tokens from services even when Kerberos 
is disabled, though not for your use case - I want to be able to collect tokens 
for the object stores. But I've avoiding going near the code as (a) I'm scared 
and (b) applications like Spark do their own checks against 
UserGroupInformation.isSecurityEnabled() which still wouldn't work

> Non-secure yarn access secure hdfs
> --
>
> Key: YARN-10382
> URL: https://issues.apache.org/jira/browse/YARN-10382
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: bianqi
>Priority: Minor
>
> In our production environment, yarn cannot enable kerberos due to yarn 
> environment problems, but our hdfs is to enable kerberos, and now we need 
> non-secure yarn to access secure hdfs.
> It is known that yarn and hdfs are both safe after security is turned on.
> I hope that after enabling hdfs security, you can use non-secure yarn to 
> access secure hdfs, or use secure yarn to access non-secure hdfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10289) spark on yarn execption

2020-06-01 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-10289.
---
Resolution: Invalid

> spark on yarn execption 
> 
>
> Key: YARN-10289
> URL: https://issues.apache.org/jira/browse/YARN-10289
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.3
> Environment: hadoop 3.0.0
>Reporter: huang xin
>Priority: Major
>
> i execute spark on yarn and get the issue like this:
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> prelaunch.out70Setting up env variables2_03? 
> Setting up job resources
> Launching container
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> stdout0(_1590115508504_0033_02_01Ωcontainer-localizer-syslog1842020-05-24
>  15:39:20,867 INFO [main] 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer:
>  Disk Validator: yarn.nodemanager.disk-validator is loaded.
> prelaunch.out70Setting up env variables
> Setting up job resources
> Launching container
> stderr333ERROR StatusLogger No log4j2 configuration file found. Using default 
> configuration: logging only errors to the console. Set system property 
> 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show 
> Log4j2 internal initialization logging.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> prelaunch.out70Setting up env variables1_05? 
> Setting up job resources
> Launching container
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> prelaunch.out70Setting up env variables1_04? 
> Setting up job resources
> Launching container
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> stdout0 
>  VERSION*(_1590115508504_0033_01_0none??data:BCFile.indexnone?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10289) spark on yarn execption

2020-06-01 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17121009#comment-17121009
 ] 

Steve Loughran commented on YARN-10289:
---

# looks more like a spark error.
# And a config one. So not a bug in their code. Check your classpath

take it up on the spark mailing lists.

> spark on yarn execption 
> 
>
> Key: YARN-10289
> URL: https://issues.apache.org/jira/browse/YARN-10289
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.3
> Environment: hadoop 3.0.0
>Reporter: huang xin
>Priority: Major
>
> i execute spark on yarn and get the issue like this:
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> prelaunch.out70Setting up env variables2_03? 
> Setting up job resources
> Launching container
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> stdout0(_1590115508504_0033_02_01Ωcontainer-localizer-syslog1842020-05-24
>  15:39:20,867 INFO [main] 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer:
>  Disk Validator: yarn.nodemanager.disk-validator is loaded.
> prelaunch.out70Setting up env variables
> Setting up job resources
> Launching container
> stderr333ERROR StatusLogger No log4j2 configuration file found. Using default 
> configuration: logging only errors to the console. Set system property 
> 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show 
> Log4j2 internal initialization logging.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> prelaunch.out70Setting up env variables1_05? 
> Setting up job resources
> Launching container
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> prelaunch.out70Setting up env variables1_04? 
> Setting up job resources
> Launching container
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> stdout0 
>  VERSION*(_1590115508504_0033_01_0none??data:BCFile.indexnone?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5277) when localizers fail due to resource timestamps being out, provide more diagnostics

2020-04-07 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077232#comment-17077232
 ] 

Steve Loughran commented on YARN-5277:
--

can you submit this as a github PR so I can review it there?

> when localizers fail due to resource timestamps being out, provide more 
> diagnostics
> ---
>
> Key: YARN-5277
> URL: https://issues.apache.org/jira/browse/YARN-5277
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Siddharth Ahuja
>Priority: Major
> Attachments: YARN-5277.001.patch
>
>
> When an NM fails a resource D/L as the timestamps are wrong, there's not much 
> info, just two long values. 
> It would be good to also include the local time values, *and the current wall 
> time*. These are the things people need to know when trying to work out what 
> went wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9696) unused import in Configuration class

2020-02-11 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-9696:
-
Summary: unused import in Configuration class  (was: one more import in 
org.apache.hadoop.conf.Configuration class)

> unused import in Configuration class
> 
>
> Key: YARN-9696
> URL: https://issues.apache.org/jira/browse/YARN-9696
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: runzhou wu
>Assignee: Jan Hentschel
>Priority: Trivial
>
> LinkedList is not used .
> it is in line 54. the content is "import java.util.LinkedList; " .i think it 
> can be delete.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9552) FairScheduler: NODE_UPDATE can cause NoSuchElementException

2019-10-09 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947857#comment-16947857
 ] 

Steve Loughran commented on YARN-9552:
--

trunk and branch-3.2 doesn't build; this looks suspiciously like the patch at 
fault. Reopening -can someone look @ this

> FairScheduler: NODE_UPDATE can cause NoSuchElementException
> ---
>
> Key: YARN-9552
> URL: https://issues.apache.org/jira/browse/YARN-9552
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9552-001.patch, YARN-9552-002.patch, 
> YARN-9552-003.patch, YARN-9552-004.patch, YARN-9552-branch-3.1.001.patch, 
> YARN-9552-branch-3.1.002.patch, YARN-9552-branch-3.2.001.patch, 
> YARN-9552-branch-3.2.002.patch, YARN-9552-branch-3.2.003.patch
>
>
> We observed a race condition inside YARN with the following stack trace:
> {noformat}
> 18/11/07 06:45:09.559 SchedulerEventDispatcher:Event Processor ERROR 
> EventDispatcher: Error in handling event type NODE_UPDATE to the Event 
> Dispatcher
> java.util.NoSuchElementException
> at 
> java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:2036)
> at 
> java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.getNextPendingAsk(AppSchedulingInfo.java:373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.isOverAMShareLimit(FSAppAttempt.java:941)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:1373)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:353)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1094)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:961)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1183)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:132)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> This is basically the same as the one described in YARN-7382, but the root 
> cause is different.
> When we create an application attempt, we create an {{FSAppAttempt}} object. 
> This contains an {{AppSchedulingInfo}} which contains a set of 
> {{SchedulerRequestKey}}. Initially, this set is empty and only initialized a 
> bit later on a separate thread during a state transition:
> {noformat}
> 2019-05-07 15:58:02,659 INFO  [RM StateStore dispatcher] 
> recovery.RMStateStore (RMStateStore.java:transition(239)) - Storing info for 
> app: application_1557237478804_0001
> 2019-05-07 15:58:02,684 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
> 2019-05-07 15:58:02,690 INFO  [SchedulerEventDispatcher:Event Processor] 
> fair.FairScheduler (FairScheduler.java:addApplication(490)) - Accepted 
> application application_1557237478804_0001 from user: bacskop, in queue: 
> root.bacskop, currently num of applications: 1
> 2019-05-07 15:58:02,698 INFO  [RM Event dispatcher] rmapp.RMAppImpl 
> (RMAppImpl.java:handle(903)) - application_1557237478804_0001 State change 
> from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
> 2019-05-07 15:58:02,731 INFO  [RM Event dispatcher] 
> resourcemanager.ApplicationMasterService 
> (ApplicationMasterService.java:registerAppAttempt(434)) - Registering app 
> attempt : appattempt_1557237478804_0001_01
> 2019-05-07 15:58:02,732 INFO  [RM Event dispatcher] attempt.RMAppAttemptImpl 
> (RMAppAttemptImpl.java:handle(920)) - appattempt_1557237478804_0001_01 
> State change from NEW to SUBMITTED on event = START
> 2019-05-07 15:58:02,746 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(207)) - *** In the constructor of 
> SchedulerApplicationAttempt
> 2019-05-07 15:58:02,747 INFO  [SchedulerEventDispatcher:Event Processor] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:(230)) 

[jira] [Commented] (YARN-9839) NodeManager java.lang.OutOfMemoryError unable to create new native thread

2019-09-19 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933262#comment-16933262
 ] 

Steve Loughran commented on YARN-9839:
--

FYI, I'm adding some tests in HADOOP-16570 which verify that one of the FS 
clients doesn't leak threads -caches the set at the start, compares those at 
the end, after filtering out some demon threads which don't ever go away. The 
same trick might work here

> NodeManager java.lang.OutOfMemoryError unable to create new native thread
> -
>
> Key: YARN-9839
> URL: https://issues.apache.org/jira/browse/YARN-9839
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> NM fails with the below error even though the ulimit for NM is large.
> {code}
> 2019-09-12 10:27:46,348 ERROR org.apache.hadoop.util.Shell: Caught 
> java.lang.OutOfMemoryError: unable to create new native thread. One possible 
> reason is that ulimit setting of 'max user processes' is too low. If so, do 
> 'ulimit -u ' and try again.
> 2019-09-12 10:27:46,348 FATAL 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LocalizerRunner for 
> container_e95_1568242982456_152026_01_000132,5,main] threw an Error.  
> Shutting down now...
> java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:717)
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:562)
> at org.apache.hadoop.util.Shell.run(Shell.java:482)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:869)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:852)
> at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1441)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1405)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$800(ResourceLocalizationService.java:140)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114)
> {code}
> For each container localization request, there is a {{LocalizerRunner}} 
> thread created and each {{LocalizerRunner}} creates another thread to get 
> file permission info which is where we see this failure from. It is in 
> Shell.java -> {{runCommand()}}
> {code}
> Thread errThread = new Thread() {
>   @Override
>   public void run() {
> try {
>   String line = errReader.readLine();
>   while((line != null) && !isInterrupted()) {
> errMsg.append(line);
> errMsg.append(System.getProperty("line.separator"));
> line = errReader.readLine();
>   }
> } catch(IOException ioe) {
>   LOG.warn("Error reading the error stream", ioe);
> }
>   }
> };
> {code}
> {{LocalizerRunner}} are Threads which are cached in 
> {{ResourceLocalizationService}}. Looking into a possibility if they are not 
> getting removed from the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9783) Remove low-level zookeeper test to be able to build Hadoop against zookeeper 3.5.5

2019-08-26 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915812#comment-16915812
 ] 

Steve Loughran commented on YARN-9783:
--

cut it

> Remove low-level zookeeper test to be able to build Hadoop against zookeeper 
> 3.5.5
> --
>
> Key: YARN-9783
> URL: https://issues.apache.org/jira/browse/YARN-9783
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szalay-Beko Mate
>Priority: Major
>
> ZooKeeper 3.5.5 release is the latest stable one. It contains many new 
> features (including SSL related improvements which are very important for 
> production use; see [the release 
> notes|https://zookeeper.apache.org/doc/r3.5.5/releasenotes.html]). Yet there 
> should be no backward incompatible changes on the API, so the applications 
> using ZooKeeper clients should be built against the new zookeeper without any 
> problem and the new ZooKeeper client should work with the older (3.4) servers 
> without any issue, at least until someone is start to use new functionality.
> The aim of this ticket is not to change the ZooKeeper version used by Hadoop 
> YARN yet, but to enable people to rebuild and test Hadoop with the new 
> ZooKeeper version.
> Currently the Hadoop build (with ZooKeeper 3.5.5) fails because of a YARN 
> test case: 
> [TestSecureRegistry.testLowlevelZKSaslLogin()|https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/test/java/org/apache/hadoop/registry/secure/TestSecureRegistry.java#L64].
>  This test case seems to use low-level ZooKeeper internal code, which changed 
> in the new ZooKeeper version. Although I am not sure what was the original 
> reasoning of the inclusion of this test in the YARN code, I propose to remove 
> it, and if there is still any missing test case in ZooKeeper, then let's 
> issue a ZooKeeper ticket to test this scenario there.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9724) ERROR SparkContext: Error initializing SparkContext.

2019-08-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901360#comment-16901360
 ] 

Steve Loughran commented on YARN-9724:
--

looking  @ the spark side of things, it's coming from the line
{code}
  logInfo("Requesting a new application from cluster with %d NodeManagers"
.format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))
{code}

That is, it's not actually doing much and could probably be reworked to be less 
brittle. 

For now, set the logger  
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend to log at WARN, 
not INFO and that line should be skipped.

> ERROR SparkContext: Error initializing SparkContext.
> 
>
> Key: YARN-9724
> URL: https://issues.apache.org/jira/browse/YARN-9724
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation, router, yarn
>Affects Versions: 3.0.0, 3.1.0
> Environment: Hadoop:3.1.0
> Spark:2.3.3
>Reporter: panlijie
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: spark.log
>
>
> we have some problemes about hadoop-yarn-federation when we use  spark on 
> yarn-federation
> The flowing Error find :
> org.apache.commons.lang.NotImplementedException: Code is not implemented
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.getClusterMetrics(FederationClientInterceptor.java:573)
>  at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.getClusterMetrics(RouterClientRMService.java:230)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getClusterMetrics(ApplicationClientProtocolPBServiceImpl.java:248)
>  at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:569)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>  at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>  at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
>  at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>  at com.sun.proxy.$Proxy16.getClusterMetrics(Unknown Source)
>  at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at 
> org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
>  at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
>  at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
>  at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
>  at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
>  at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:164)
>  at org.apache.spark.SparkContext.(SparkContext.scala:500)
>  at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
>  at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
>  at scala.Option.getOrElse(Option.scala:121)
>  at 
> 

[jira] [Commented] (YARN-9646) Yarn miniYarn cluster tests failed to bind to a local host name

2019-07-10 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881899#comment-16881899
 ] 

Steve Loughran commented on YARN-9646:
--

I don't have any concerns, just am not going to give a vote of approval myself 
as I haven't been near this code for that long. All I know is that the 
minicluster is fussy and refuses to work with file:// as a cluster fs when you 
turn Kerberos on, that being the biggest blocker to some of my uses. 

> Yarn miniYarn cluster tests failed to bind to a local host name
> ---
>
> Key: YARN-9646
> URL: https://issues.apache.org/jira/browse/YARN-9646
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.4
>Reporter: Ray Yang
>Assignee: Ray Yang
>Priority: Major
>
> When running the integration test 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell#testDSShellWithoutDomain
> at home
> The following error happened:
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [ruyang-mn3.linkedin.biz:0] 
> java.net.BindException: Can't assign requested address; For more details see: 
>  [http://wiki.apache.org/hadoop/BindException]
>  
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:327)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.access$400(MiniYARNCluster.java:99)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:447)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:278)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.setupInternal(TestDistributedShell.java:91)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.setup(TestDistributedShell.java:71)
> …
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [ruyang-mn3.linkedin.biz:0] 
> java.net.BindException: Can't assign requested address; For more details see: 
>  [http://wiki.apache.org/hadoop/BindException]
> at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
> at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
> at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.*ResourceTrackerService.serviceStart*(ResourceTrackerService.java:163)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:976)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1013)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1013)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1053)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:319)
> ... 31 more
> Caused by: java.net.BindException: Problem binding to 
> [ruyang-mn3.linkedin.biz:0]java.net.BindException: Can't assign requested 
> address; For more details see:  [http://wiki.apache.org/hadoop/BindException]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 

[jira] [Commented] (YARN-9646) Yarn miniYarn cluster tests failed to bind to a local host name

2019-06-25 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872218#comment-16872218
 ] 

Steve Loughran commented on YARN-9646:
--

I'd try that. If you look @ the edit history of the wiki page its ~a list of 
how I've misconfigured my laptop in the past

> Yarn miniYarn cluster tests failed to bind to a local host name
> ---
>
> Key: YARN-9646
> URL: https://issues.apache.org/jira/browse/YARN-9646
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Yang
>Assignee: Ray Yang
>Priority: Major
>
> When running the integration test 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell#testDSShellWithoutDomain
> at home
> The following error happened:
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [ruyang-mn3.linkedin.biz:0] 
> java.net.BindException: Can't assign requested address; For more details see: 
>  [http://wiki.apache.org/hadoop/BindException]
>  
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:327)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.access$400(MiniYARNCluster.java:99)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:447)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:278)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.setupInternal(TestDistributedShell.java:91)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.setup(TestDistributedShell.java:71)
> …
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [ruyang-mn3.linkedin.biz:0] 
> java.net.BindException: Can't assign requested address; For more details see: 
>  [http://wiki.apache.org/hadoop/BindException]
> at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
> at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
> at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.*ResourceTrackerService.serviceStart*(ResourceTrackerService.java:163)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:976)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1013)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1013)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1053)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:319)
> ... 31 more
> Caused by: java.net.BindException: Problem binding to 
> [ruyang-mn3.linkedin.biz:0]java.net.BindException: Can't assign requested 
> address; For more details see:  [http://wiki.apache.org/hadoop/BindException]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
> at org.apache.hadoop.ipc.Server.bind(Server.java:494)
> at org.apache.hadoop.ipc.Server$Listener.(Server.java:715)
> at 

[jira] [Commented] (YARN-9646) Yarn miniYarn cluster tests failed to bind to a local host name

2019-06-24 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16871824#comment-16871824
 ] 

Steve Loughran commented on YARN-9646:
--

what happens if you add a 127.0.0.1 entry for that FQDN in /etc/hosts?

> Yarn miniYarn cluster tests failed to bind to a local host name
> ---
>
> Key: YARN-9646
> URL: https://issues.apache.org/jira/browse/YARN-9646
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ray Yang
>Assignee: Ray Yang
>Priority: Major
>
> When running the integration test 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell#testDSShellWithoutDomain
> at home
> The following error happened:
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [ruyang-mn3.linkedin.biz:0] 
> java.net.BindException: Can't assign requested address; For more details see: 
>  [http://wiki.apache.org/hadoop/BindException]
>  
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:327)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.access$400(MiniYARNCluster.java:99)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:447)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:278)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.setupInternal(TestDistributedShell.java:91)
> at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.setup(TestDistributedShell.java:71)
> …
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.net.BindException: Problem binding to [ruyang-mn3.linkedin.biz:0] 
> java.net.BindException: Can't assign requested address; For more details see: 
>  [http://wiki.apache.org/hadoop/BindException]
> at 
> org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
> at 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
> at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.*ResourceTrackerService.serviceStart*(ResourceTrackerService.java:163)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:588)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:976)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1017)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1013)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1013)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1053)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:319)
> ... 31 more
> Caused by: java.net.BindException: Problem binding to 
> [ruyang-mn3.linkedin.biz:0]java.net.BindException: Can't assign requested 
> address; For more details see:  [http://wiki.apache.org/hadoop/BindException]
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721)
> at org.apache.hadoop.ipc.Server.bind(Server.java:494)
> at org.apache.hadoop.ipc.Server$Listener.(Server.java:715)
> at org.apache.hadoop.ipc.Server.(Server.java:2464)
> at 

[jira] [Commented] (YARN-9607) Auto-configuring rollover-size of IFile format for non-appendable filesystems

2019-06-19 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16867857#comment-16867857
 ] 

Steve Loughran commented on YARN-9607:
--

HADOOP-15691 proposes adding a new method, getPathCapabilities to allow you 
check exactly that. Sadly, its received approximate 0 reviews, and is only 
something I'm working on in my spare time.

I would like to see it in, and given that YARN is shipped in sync with 
everything else, now is a good time for other people to offer to get in 
involved in this

If I offer to update that patch, who in this JIRA discussion promises to review 
it with a goal of getting it committed? As once it is in, the check would as 
simple as 

{code}
 if (fs.hasPathCapability("fs.feature.append", path)) {
 ...
} else {
 //fallback
}
{code}

that's it: no need to second guess things downstream, write code which only 
works on some stores, as determined by trial and error and support calls.

But: if nobody puts their hand up to say "I need this and will help get it in" 
-you aren't going to get it

> Auto-configuring rollover-size of IFile format for non-appendable filesystems
> -
>
> Key: YARN-9607
> URL: https://issues.apache.org/jira/browse/YARN-9607
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-9607.001.patch, YARN-9607.002.patch
>
>
> In YARN-9525, we made IFile format compatible with remote folders with s3a 
> scheme. In rolling fashioned log-aggregation IFile still fails with the 
> "append is not supported" error message, which is a known limitation of the 
> format by design. 
> There is a workaround though: setting the rollover size in the configuration 
> of the IFile format, in each rolling cycle a new aggregated log file will be 
> created, thus we eliminated the append from the process. Setting this config 
> globally would cause performance problems in the regular log-aggregation, so 
> I'm suggesting to enforcing this config to zero, if the scheme of the URI is 
> s3a (or any other non-appendable filesystem).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9545) Create healthcheck REST endpoint for ATSv2

2019-06-12 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862331#comment-16862331
 ] 

Steve Loughran commented on YARN-9545:
--

thanks

> Create healthcheck REST endpoint for ATSv2
> --
>
> Key: YARN-9545
> URL: https://issues.apache.org/jira/browse/YARN-9545
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9545.001.patch, YARN-9545.002.patch, 
> YARN-9545.003.patch, YARN-9545.004.patch, YARN-9545.branch-3.2.001.patch, 
> YARN-9545.branch-3.2.002.patch
>
>
> RM UI2 and CM needs a health check url for ATSv2 service.
> Create a /health rest endpoint
>  * must respond with 200 \{health: ok} if all ok
>  * must respond with non 200 if any problem occurs
>  * could check reader/writer connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9545) Create healthcheck REST endpoint for ATSv2

2019-06-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858826#comment-16858826
 ] 

Steve Loughran commented on YARN-9545:
--

I'm seeing ASF license errors on branch-3.2 runs from the yarn.lock file 
committed here

bq. !? 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/yarn.lock

Whatever is needed to tell the licence checker to skip that file needs to be 
pulled back from trunk. 

> Create healthcheck REST endpoint for ATSv2
> --
>
> Key: YARN-9545
> URL: https://issues.apache.org/jira/browse/YARN-9545
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9545.001.patch, YARN-9545.002.patch, 
> YARN-9545.003.patch, YARN-9545.004.patch, YARN-9545.branch-3.2.001.patch, 
> YARN-9545.branch-3.2.002.patch
>
>
> RM UI2 and CM needs a health check url for ATSv2 service.
> Create a /health rest endpoint
>  * must respond with 200 \{health: ok} if all ok
>  * must respond with non 200 if any problem occurs
>  * could check reader/writer connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9525) IFile format is not working against s3a remote folder

2019-06-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858565#comment-16858565
 ] 

Steve Loughran commented on YARN-9525:
--

thomas, HADOOP-13327 tries to nail down hflush/hsync; just pushed a rebase of 
it up for people to play with. As with all the other FS contract stuff, it's 
invariably spare time, but I'd love to see it in. That adds the output tests 
into the existing create contract test, so anyone who implements that suite 
gets the new tests automatically

getPos does seem a better strategy here. Adam: what do you think? 

FWIW, I think I'd better add getPos() probes to the sync/flush tests. I suspect 
we're not updating it reliably enough

> IFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch, 
> YARN-9525.002.patch, YARN-9525.003.patch
>
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
>   ... 7 more
> {noformat}
> This stack trace point to 
> {{LogAggregationIndexedFileController$initializeWriter}} where we do the 
> following steps (in a non-rolling log aggregation setup):
> - create FSDataOutputStream
> - writing out a UUID
> - flushing
> - immediately after that we call a GetFileStatus to get the length of the log 
> file (the bytes we just wrote out), and that's where the failures happens: 
> the file is not there yet due to eventual consistency.
> Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (YARN-9525) IFile format is not working against s3a remote folder

2019-06-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858004#comment-16858004
 ] 

Steve Loughran commented on YARN-9525:
--

bq. Have we tested the patch when rolling aggregation is enabled, and file 
system is appendable? Just want to make sure append-rolling scenario is not 
broken by it.

Good point. Would it actually be possible to pull this out into something you 
could actually make a standalone test against a filesystem? Then we could check 
with the various stores. As an example. {{AbstractContractDistCpTest}} is used 
to verify distcp for s3, abfs, and google gcs can pick it up too. 

I'm sure [~tmarquardt] would be happy knowing that there were built in tests 
validating abfs support running whenever someone ran the azure test suite with 
the right credentials.


> IFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch, 
> YARN-9525.002.patch, YARN-9525.003.patch
>
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
>   ... 7 more
> {noformat}
> This stack trace point to 
> {{LogAggregationIndexedFileController$initializeWriter}} where we do the 
> following steps (in a non-rolling log aggregation setup):
> - create FSDataOutputStream
> - writing out a UUID
> - flushing
> - immediately after that we call a GetFileStatus to get the length of the log 
> file (the bytes we just wrote out), and that's where the failures happens: 
> the file is not there yet due to eventual consistency.
> Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-9525) IFile format is not working against s3a remote folder

2019-05-31 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16853254#comment-16853254
 ] 

Steve Loughran commented on YARN-9525:
--

I wonder if the -4 thing is due to the logic "if createdNew == true then offset 
= 0", when really, as a uuid has just been written, the offset is probably 4.

> IFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch
>
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
>   ... 7 more
> {noformat}
> This stack trace point to 
> {{LogAggregationIndexedFileController$initializeWriter}} where we do the 
> following steps (in a non-rolling log aggregation setup):
> - create FSDataOutputStream
> - writing out a UUID
> - flushing
> - immediately after that we call a GetFileStatus to get the length of the log 
> file (the bytes we just wrote out), and that's where the failures happens: 
> the file is not there yet due to eventual consistency.
> Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9568) NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover

2019-05-21 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844711#comment-16844711
 ] 

Steve Loughran commented on YARN-9568:
--

bq. For MiniYarnCluster we could configure the node atttribute store path to 
unique folder inside targetWorkDir. 
This could solve the issue rt 

that's what I Was thinking something like
{code}
// to ensure that any FileSystemNodeAttributeStore started by the RM always
// uses a unique path, if unset, force it under the test dir.
if (conf.get(YarnConfiguration.FS_NODE_ATTRIBUTE_STORE_ROOT_DIR) == null) {
  File nodeAttrDir = new File(getTestWorkDir(), "nodeattributes");
  conf.set(YarnConfiguration.FS_NODE_ATTRIBUTE_STORE_ROOT_DIR,
  nodeAttrDir.getCanonicalPath());
}
{code}

The patch as submitted doesn't work as 
{{NodeAttributeTestUtils.getRandomDirConf}} creates a new configuration object; 
it needs to be the shared one of the RM which is patched.

> NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover
> --
>
> Key: YARN-9568
> URL: https://issues.apache.org/jira/browse/YARN-9568
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.3.0
> Environment: macos
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: YARN-9568.001.patch, npe.log
>
>
> This seems new in trunk. As in "wasn't happening a couple of weeks ago". Its 
> surfacing in the S3A committer tests which are trying to create 
> MiniYarnClusters: all such tests are failing as the mini yarn cluster won't 
> come up with an NPE in {{FileSystemNodeAttributeStore.recover}}
> I'm not sure why node labels are needed on test clusters; the default implies 
> they should be off anyway.
> At the same time, I can't seem to find one specific change in the git log to 
> say "this is causing the problem".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9568) NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover

2019-05-20 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844227#comment-16844227
 ] 

Steve Loughran commented on YARN-9568:
--

Looking more at this looks like a bad assumption in the whole recovery logic: 
the data in the files can be recovered. Any error in loading should be treated 
as no data to recover

> NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover
> --
>
> Key: YARN-9568
> URL: https://issues.apache.org/jira/browse/YARN-9568
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.3.0
> Environment: macos
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: npe.log
>
>
> This seems new in trunk. As in "wasn't happening a couple of weeks ago". Its 
> surfacing in the S3A committer tests which are trying to create 
> MiniYarnClusters: all such tests are failing as the mini yarn cluster won't 
> come up with an NPE in {{FileSystemNodeAttributeStore.recover}}
> I'm not sure why node labels are needed on test clusters; the default implies 
> they should be off anyway.
> At the same time, I can't seem to find one specific change in the git log to 
> say "this is causing the problem".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7988) Refactor FSNodeLabelStore code for Node Attributes store support

2019-05-20 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844226#comment-16844226
 ] 

Steve Loughran commented on YARN-7988:
--

Just traced down the cause of YARN-9568 to be ~this patch. Could we have some 
resilience to/recovery from bad data on the FS? thanks

> Refactor FSNodeLabelStore code for Node Attributes store support
> 
>
> Key: YARN-7988
> URL: https://issues.apache.org/jira/browse/YARN-7988
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7988-YARN-3409.002.patch, 
> YARN-7988-YARN-3409.003.patch, YARN-7988-YARN-3409.004.patch, 
> YARN-7988-YARN-3409.005.patch, YARN-7988-YARN-3409.006.patch, 
> YARN-7988-YARN-3409.007.patch, YARN-7988.001.patch
>
>
> # Abstract out file FileSystemStore operation
> # Define EditLog Operartions  and Mirror operation
> # Support compatibility with old nodelabel store



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9568) NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover

2019-05-20 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844223#comment-16844223
 ] 

Steve Loughran commented on YARN-9568:
--

I can make this "go away" by rm'ing everything in /tmp/hadoop-yarn-stevel/*

That is: the state of a single shared path can break all unit tests running 
locally. And presumably in production, cause RM startup to fail with not very 
meaningful error text


Proposed
* init code handles unreadable files somehow
* for the minicluster we don't use a fixed location for the files, as with 
parallel test runs its inevitable that eventually they will end up in a 
corrupted state

> NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover
> --
>
> Key: YARN-9568
> URL: https://issues.apache.org/jira/browse/YARN-9568
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.3.0
> Environment: macos
>Reporter: Steve Loughran
>Priority: Major
> Attachments: npe.log
>
>
> This seems new in trunk. As in "wasn't happening a couple of weeks ago". Its 
> surfacing in the S3A committer tests which are trying to create 
> MiniYarnClusters: all such tests are failing as the mini yarn cluster won't 
> come up with an NPE in {{FileSystemNodeAttributeStore.recover}}
> I'm not sure why node labels are needed on test clusters; the default implies 
> they should be off anyway.
> At the same time, I can't seem to find one specific change in the git log to 
> say "this is causing the problem".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9568) NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover

2019-05-20 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-9568:
-
Priority: Minor  (was: Major)

> NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover
> --
>
> Key: YARN-9568
> URL: https://issues.apache.org/jira/browse/YARN-9568
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.3.0
> Environment: macos
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: npe.log
>
>
> This seems new in trunk. As in "wasn't happening a couple of weeks ago". Its 
> surfacing in the S3A committer tests which are trying to create 
> MiniYarnClusters: all such tests are failing as the mini yarn cluster won't 
> come up with an NPE in {{FileSystemNodeAttributeStore.recover}}
> I'm not sure why node labels are needed on test clusters; the default implies 
> they should be off anyway.
> At the same time, I can't seem to find one specific change in the git log to 
> say "this is causing the problem".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9568) NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover

2019-05-20 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844201#comment-16844201
 ] 

Steve Loughran commented on YARN-9568:
--

attached full log

> NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover
> --
>
> Key: YARN-9568
> URL: https://issues.apache.org/jira/browse/YARN-9568
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.3.0
> Environment: macos
>Reporter: Steve Loughran
>Priority: Major
> Attachments: npe.log
>
>
> This seems new in trunk. As in "wasn't happening a couple of weeks ago". Its 
> surfacing in the S3A committer tests which are trying to create 
> MiniYarnClusters: all such tests are failing as the mini yarn cluster won't 
> come up with an NPE in {{FileSystemNodeAttributeStore.recover}}
> I'm not sure why node labels are needed on test clusters; the default implies 
> they should be off anyway.
> At the same time, I can't seem to find one specific change in the git log to 
> say "this is causing the problem".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9568) NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover

2019-05-20 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-9568:
-
Attachment: npe.log

> NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover
> --
>
> Key: YARN-9568
> URL: https://issues.apache.org/jira/browse/YARN-9568
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, test
>Affects Versions: 3.3.0
> Environment: macos
>Reporter: Steve Loughran
>Priority: Major
> Attachments: npe.log
>
>
> This seems new in trunk. As in "wasn't happening a couple of weeks ago". Its 
> surfacing in the S3A committer tests which are trying to create 
> MiniYarnClusters: all such tests are failing as the mini yarn cluster won't 
> come up with an NPE in {{FileSystemNodeAttributeStore.recover}}
> I'm not sure why node labels are needed on test clusters; the default implies 
> they should be off anyway.
> At the same time, I can't seem to find one specific change in the git log to 
> say "this is causing the problem".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9568) NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover

2019-05-20 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844138#comment-16844138
 ] 

Steve Loughran commented on YARN-9568:
--

{code}

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.NullPointerException

at 
org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:373)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at 
org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at 
org.apache.hadoop.fs.s3a.yarn.ITestS3AMiniYarnCluster.setup(ITestS3AMiniYarnCluster.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.NodesToAttributesMappingRequestPBImpl.initNodeAttributesMapping(NodesToAttributesMappingRequestPBImpl.java:102)
at 
org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.NodesToAttributesMappingRequestPBImpl.getNodesToAttributes(NodesToAttributesMappingRequestPBImpl.java:117)
at 
org.apache.hadoop.yarn.nodelabels.store.op.FSNodeStoreLogOp.getNodeToAttributesMap(FSNodeStoreLogOp.java:46)
at 
org.apache.hadoop.yarn.nodelabels.store.op.NodeAttributeMirrorOp.recover(NodeAttributeMirrorOp.java:57)
at 
org.apache.hadoop.yarn.nodelabels.store.op.NodeAttributeMirrorOp.recover(NodeAttributeMirrorOp.java:35)
at 
org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.loadFromMirror(AbstractFSNodeStore.java:121)
at 
org.apache.hadoop.yarn.nodelabels.store.AbstractFSNodeStore.recoverFromStore(AbstractFSNodeStore.java:150)
at 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.FileSystemNodeAttributeStore.recover(FileSystemNodeAttributeStore.java:95)
at 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.NodeAttributesManagerImpl.initNodeAttributeStore(NodeAttributesManagerImpl.java:140)
at 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.NodeAttributesManagerImpl.serviceStart(NodeAttributesManagerImpl.java:123)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:918)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1285)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1326)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1322)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1373)
at 

[jira] [Created] (YARN-9568) NPE in MiniYarnCluster during FileSystemNodeAttributeStore.recover

2019-05-20 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-9568:


 Summary: NPE in MiniYarnCluster during 
FileSystemNodeAttributeStore.recover
 Key: YARN-9568
 URL: https://issues.apache.org/jira/browse/YARN-9568
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, test
Affects Versions: 3.3.0
 Environment: macos
Reporter: Steve Loughran


This seems new in trunk. As in "wasn't happening a couple of weeks ago". Its 
surfacing in the S3A committer tests which are trying to create 
MiniYarnClusters: all such tests are failing as the mini yarn cluster won't 
come up with an NPE in {{FileSystemNodeAttributeStore.recover}}

I'm not sure why node labels are needed on test clusters; the default implies 
they should be off anyway.

At the same time, I can't seem to find one specific change in the git log to 
say "this is causing the problem".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9534) TimelineV2ClientImpl.pollTimelineServiceAddress() throws a YarnException when it is interrupted

2019-05-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-9534.
--
Resolution: Won't Fix

YarnException is an RTE and not declared on the entire hierarchy, applications 
expect it.

We cannot change the failure semantics to switch to a completely new checked 
exception hierarchy.

> TimelineV2ClientImpl.pollTimelineServiceAddress() throws a YarnException when 
> it is interrupted
> ---
>
> Key: YARN-9534
> URL: https://issues.apache.org/jira/browse/YARN-9534
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: eBugs
>Priority: Minor
>
> Dear YARN developers, we are developing a tool to detect exception-related 
> bugs in Java. Our prototype has spotted the following throw statement whose 
> exception class and error message indicate different error conditions. 
>   
> Version: Hadoop-3.1.2 
> File: 
> HADOOP-ROOT/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineV2ClientImpl.java
> Line: 354
> {code:java}
> try {
>   Thread.sleep(this.serviceRetryInterval);
> } catch (InterruptedException e) {
>   Thread.currentThread().interrupt();
>   throw new YarnException("Interrupted while trying to connect ATS");
> }{code}
>  
> The exception is triggered when {{pollTimelineServiceAddress()}} is 
> interrupted. However, throwing a {{YarnException}} is too general and makes 
> accurate exception handling more difficult. If throwing the 
> {{InterruptedException}} is not preferred, throwing an 
> {{InterruptedIOException}} or wrapping the {{InterruptedException}} could be 
> more accurate here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9525) TFile format is not working against s3a remote folder

2019-05-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833749#comment-16833749
 ] 

Steve Loughran commented on YARN-9525:
--

this isn't eventual consistency: it is that the file doesn't exist until 
close() is called.

it's probably broken against HDFS too, as hdfs only updates fileStatus when a 
write crosses a block boundary.

This all looks like some attempt to track the length of the output stream. A 
better way to do this would be to track bytes streamed through some wrapper of 
the output stream or other tracking mechanism

> TFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
>   ... 7 more
> {noformat}
> This stack trace point to 
> {{LogAggregationIndexedFileController$initializeWriter}} where we do the 
> following steps (in a non-rolling log aggregation setup):
> - create FSDataOutputStream
> - writing out a UUID
> - flushing
> - immediately after that we call a GetFileStatus to get the length of the log 
> file (the bytes we just wrote out), and that's where the failures happens: 
> the file is not there yet due to eventual consistency.
> Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9470) Fix order of actual and expected expression in assert statements

2019-04-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817542#comment-16817542
 ] 

Steve Loughran commented on YARN-9470:
--

AssertJ is already on the classpath for hdfs-test; no reason not to use it in 
YARN

> Fix order of actual and expected expression in assert statements
> 
>
> Key: YARN-9470
> URL: https://issues.apache.org/jira/browse/YARN-9470
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9470-001.patch, assertEquals
>
>
> Fix order of actual and expected expression in assert statements which gives 
> misleading message when test case fails. Attached file has some of the places 
> where it is placed wrongly. 
> {code}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
> {code}
> For long term, [AssertJ|http://joel-costigliola.github.io/assertj/] can be 
> used for new test cases which avoids such mistakes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-04-09 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813745#comment-16813745
 ] 

Steve Loughran commented on YARN-999:
-

no worries, gone back one commit locally

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-291.000.patch, YARN-999.001.patch, 
> YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, 
> YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, 
> YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch
>
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be used here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-04-09 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened YARN-999:
-

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-291.000.patch, YARN-999.001.patch, 
> YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, 
> YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, 
> YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch
>
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be used here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-04-09 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813733#comment-16813733
 ] 

Steve Loughran commented on YARN-999:
-

I think this has broken the build
{code}
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hadoop-sls: Compilation failure: Compilation failure: 
[ERROR] 
/Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NodeInfo.java:[60,18]
 org.apache.hadoop.yarn.sls.nodemanager.NodeInfo.FakeRMNodeImpl is not abstract 
and does not override abstract method resetUpdatedCapability() in 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNode
[ERROR] 
/Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/RMNodeWrapper.java:[47,8]
 org.apache.hadoop.yarn.sls.scheduler.RMNodeWrapper is not abstract and does 
not override abstract method resetUpdatedCapability() in 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeI
{code}

Can you fix this ASAP, so we don't have to roll things back.Or change the 
modified interface to have some default functions. 


> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Íñigo Goiri
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-291.000.patch, YARN-999.001.patch, 
> YARN-999.002.patch, YARN-999.003.patch, YARN-999.004.patch, 
> YARN-999.005.patch, YARN-999.006.patch, YARN-999.007.patch, 
> YARN-999.008.patch, YARN-999.009.patch, YARN-999.010.patch
>
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be used here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Deleted] (YARN-9422) Simon poortman

2019-03-28 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran deleted YARN-9422:
-


> Simon poortman
> --
>
> Key: YARN-9422
> URL: https://issues.apache.org/jira/browse/YARN-9422
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Simon Poortman
>Priority: Major
>  Labels: Beste
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2019-03-25 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800851#comment-16800851
 ] 

Steve Loughran commented on YARN-7129:
--

if the font has been moved to an artifact download, I'm happy, No opinions one 
the other bits -will leave to others

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch, YARN-7129.025.patch, 
> YARN-7129.026.patch, YARN-7129.027.patch, YARN-7129.028.patch, 
> YARN-7129.029.patch, YARN-7129.030.patch, YARN-7129.031.patch, 
> YARN-7129.032.patch, YARN-7129.033.patch, YARN-7129.034.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2019-03-21 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798582#comment-16798582
 ] 

Steve Loughran commented on YARN-7129:
--

* versions of artifacts in the webapp pom should be taken from the hadoop 
project uber-pom; so maintained in sync. mockito, is that central pom already, 
for example
* same for all the maven plugin versions. If they are new plugins, add the 
property to the hadoop-project jar and then reference it.
* Not reviewing the code, trusting you all there.

Is there a way to have some example which doesn't add large amounts of binary 
data? Because its going to make our repo even bigger, increase the time it 
takes to switch across branches slower, etc -stuff I do do regularly. Git isn't 
a place to keep binaries

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch, YARN-7129.025.patch, 
> YARN-7129.026.patch, YARN-7129.027.patch, YARN-7129.028.patch, 
> YARN-7129.029.patch, YARN-7129.030.patch, YARN-7129.031.patch, 
> YARN-7129.032.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil

2019-03-20 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4435:
-
Attachment: YARN-4435-003.patch

> Add RM Delegation Token DtFetcher Implementation for DtUtil
> ---
>
> Key: YARN-4435
> URL: https://issues.apache.org/jira/browse/YARN-4435
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, security, yarn
>Affects Versions: 3.0.0-alpha2
>Reporter: Matthew Paduano
>Assignee: Matthew Paduano
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-4435-003.patch, YARN-4435-003.patch, 
> YARN-4435.00.patch.txt, YARN-4435.01.patch, YARN-4435.02.patch, 
> proposed_solution
>
>
> Add a class to yarn project that implements the DtFetcher interface to return 
> a RM delegation token object.  
> I attached a proposed class implementation that does this, but it cannot be 
> added as a patch until the interface is merged in HADOOP-12563



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Moved] (YARN-9377) docker builds having problems with main/native/container-executor/test/

2019-03-11 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HADOOP-16123 to YARN-9377:
---

Affects Version/s: (was: 3.3.0)
   3.3.0
  Component/s: (was: test)
   (was: build)
   test
   build
  Key: YARN-9377  (was: HADOOP-16123)
  Project: Hadoop YARN  (was: Hadoop Common)

> docker builds having problems with main/native/container-executor/test/
> ---
>
> Key: YARN-9377
> URL: https://issues.apache.org/jira/browse/YARN-9377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, test
>Affects Versions: 3.3.0
> Environment: docker
>Reporter: lqjacklee
>Priority: Minor
> Attachments: HADOOP-16123-001.patch, HADOOP-16123-002.patch
>
>
> During build the source code , do the steps as below : 
>  
> 1, run docker daemon 
> 2, ./start-build-env.sh
> 3, sudo mvn clean install -DskipTests -Pnative 
> the response prompt that : 
> [ERROR] Failed to execute goal 
> org.apache.hadoop:hadoop-maven-plugins:3.3.0-SNAPSHOT:protoc (compile-protoc) 
> on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 
> 'protoc --version' did not return a version -> 
> [Help 1]
> However , when execute the command : whereis protoc 
> liu@a65d187055f9:~/hadoop$ whereis protoc
> protoc: /opt/protobuf/bin/protoc
>  
> the PATH value : 
> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/cmake/bin:/opt/protobuf/bin
>  
> liu@a65d187055f9:~/hadoop$ protoc --version
> libprotoc 2.5.0
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9348) Build issues on hadoop-yarn-application-catalog-webapp

2019-03-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786042#comment-16786042
 ] 

Steve Loughran commented on YARN-9348:
--

bq. Building on mac will trigger access to osx keychain to attempt to login to 
Dockerhub.

funny

> Build issues on hadoop-yarn-application-catalog-webapp
> --
>
> Key: YARN-9348
> URL: https://issues.apache.org/jira/browse/YARN-9348
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9348.001.patch, YARN-9348.002.patch, 
> YARN-9348.003.patch
>
>
> A couple reports jenkins precommit builds are failing due to integration 
> problem between nodejs libraries and Yetus.  Problems are:
> # Nodejs third party libraries are checked by whitespace check, which 
> generates many errors.  One possible solution is to move nodejs libraries 
> placement from project top level directory to target directory to prevent 
> stumble on whitespace checks.
> # maven clean fails because clean plugin tries to remove target directory and 
> files inside target/generated-sources directories to cause race conditions.
> # Building on mac will trigger access to osx keychain to attempt to login to 
> Dockerhub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2019-03-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785470#comment-16785470
 ] 

Steve Loughran commented on YARN-7129:
--

this is doing odd things to builds. e.g the latest HADOOP-15625 yetus run 
couldn't clean it up {code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean) on 
project hadoop-yarn-applications-catalog-webapp: Failed to clean project: 
Failed to delete 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-webapp/target/generated-sources/vendor/ecstatic/test/public
{code}

 I seemed to find a copy of the tree was hanging around locally when I switched 
branches. I thought that was a one-off, but if yetus is getting confused too...


> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch, YARN-7129.025.patch, 
> YARN-7129.026.patch, YARN-7129.027.patch, YARN-7129.028.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Moved] (YARN-9241) Remove if else block from RmController.java

2019-01-27 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HADOOP-16070 to YARN-9241:
---

Affects Version/s: (was: 3.1.1)
   (was: 3.0.1)
   3.2.0
  Component/s: (was: common)
   resourcemanager
  Key: YARN-9241  (was: HADOOP-16070)
  Project: Hadoop YARN  (was: Hadoop Common)

> Remove if else block from RmController.java
> ---
>
> Key: YARN-9241
> URL: https://issues.apache.org/jira/browse/YARN-9241
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Anuj
>Priority: Minor
>
> RmController contains a hardcoded if and else block for type of scheduler and 
> decides which page to use for which scheduler.
> This if else block makes it hard to introduce a new scheduler and 
> corresponding webpage with modifying the existing RMController class.
> It would be great if we make it extendable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9241) Remove if else block from RmController.java

2019-01-27 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753481#comment-16753481
 ] 

Steve Loughran commented on YARN-9241:
--

Moved to YARN project; recommend changing title to be more specific

> Remove if else block from RmController.java
> ---
>
> Key: YARN-9241
> URL: https://issues.apache.org/jira/browse/YARN-9241
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Anuj
>Priority: Minor
>
> RmController contains a hardcoded if and else block for type of scheduler and 
> decides which page to use for which scheduler.
> This if else block makes it hard to introduce a new scheduler and 
> corresponding webpage with modifying the existing RMController class.
> It would be great if we make it extendable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9148) AggregatedLogDeletion doesnt work with S3

2019-01-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742057#comment-16742057
 ] 

Steve Loughran commented on YARN-9148:
--

If there's an existing one for real filesystems, stick with that. There's no 
alternative to S3 but scanning all the files, I'm afraid

> AggregatedLogDeletion doesnt work with S3
> -
>
> Key: YARN-9148
> URL: https://issues.apache.org/jira/browse/YARN-9148
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9148.001.patch
>
>
> Aggregated Log deletion works based on modification time of application 
> directory
> S3AFileStatus give current time in case of directory.
> {code}
> if (appDir.isDirectory() &&
> appDir.getModificationTime() < cutoffMillis) {
>   ApplicationId appId = ApplicationId.fromString(
> {code}
> S3AFileStatus#getModificationTime
> {code}
>   @Override
>   public long getModificationTime(){
> if(isDirectory()){
>   return System.currentTimeMillis();
> } else {
>   return super.getModificationTime();
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9148) AggregatedLogDeletion doesnt work with S3

2019-01-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741951#comment-16741951
 ] 

Steve Loughran commented on YARN-9148:
--

General

* newline betwee import java. and org.apache. references
* fields which never get changed after instantiation -> final
* Use SLF4J for logging, with its LOG.info("item {}", value) methods. 
* recommend listFiles(path, false) over listStatus as it is more incremental in 
collecting results. Even for HDFS there are benefits on large directories as NN 
needs to lock dir for the whole listing, marshall it back in one go. It's good 
to get into the habit.

There aren't any explicit tests for this. I will leave it for the YARN team to 
decide how important that is. It seems to me that if the various scan 
operations could be isolated then this would be possible, with the test setup 
creating directories with the different characteristics.

If such a test suite were able to run without needing any RM, then we could 
probably pull in the test module and do store specific subclasses to test their 
behaviour. Without that we can't be confident that this will do the right thing 
against the stores.



h3. ApplicationLogCleanerTask

One thing to consider with this design is that for a "real" FS, time to scan 
the dir will become potentially O(files) over O(1). If the number of logs is 
low, this isn't going to be an issue, but if may if certain conditions are met


* there are many hundreds of child entries
* they are all older than the cutoff time, hence the loop doesn't break.

This means the cost of scanning will increase. Is that going to matter? 


h3. CachedApplicationLogCleanerTask

* Again, split up the imports
* and final {{CacheKey}} fields to be declared as final.

h3. CachedApplicationLogCleanerTask.loadAppModificationTime

you can skip the FS.exists(path) check, just call listFiles, listStatus and 
catch the FNFE exception raised. Against S3 that can save 3-4 HTTP requests. 


h3. deleteAggregatedLogs

h3. CachedApplicationLogCleanerTask.deleteAggregatedLogs

This is going to be v. inefficient on object stores, as the treewalk will 
encounter the very inefficent simulation of directory trees in the stores.

better to use listFiles(path, true), *if there is an obvious way to do it*. If 
there isn't don't worry about it.




> AggregatedLogDeletion doesnt work with S3
> -
>
> Key: YARN-9148
> URL: https://issues.apache.org/jira/browse/YARN-9148
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9148.001.patch
>
>
> Aggregated Log deletion works based on modification time of application 
> directory
> S3AFileStatus give current time in case of directory.
> {code}
> if (appDir.isDirectory() &&
> appDir.getModificationTime() < cutoffMillis) {
>   ApplicationId appId = ApplicationId.fromString(
> {code}
> S3AFileStatus#getModificationTime
> {code}
>   @Override
>   public long getModificationTime(){
> if(isDirectory()){
>   return System.currentTimeMillis();
> } else {
>   return super.getModificationTime();
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9148) AggregatedLogDeletion doesnt work with S3

2019-01-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740617#comment-16740617
 ] 

Steve Loughran commented on YARN-9148:
--

Will this dir ever have subdirectories?

Probably better to use listFiles(path, true), as that way if there's a tree of 
files they'll all get listed and deleted..

> AggregatedLogDeletion doesnt work with S3
> -
>
> Key: YARN-9148
> URL: https://issues.apache.org/jira/browse/YARN-9148
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9148.001.patch
>
>
> Aggregated Log deletion works based on modification time of application 
> directory
> S3AFileStatus give current time in case of directory.
> {code}
> if (appDir.isDirectory() &&
> appDir.getModificationTime() < cutoffMillis) {
>   ApplicationId appId = ApplicationId.fromString(
> {code}
> S3AFileStatus#getModificationTime
> {code}
>   @Override
>   public long getModificationTime(){
> if(isDirectory()){
>   return System.currentTimeMillis();
> } else {
>   return super.getModificationTime();
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9148) AggregatedLogDeletion doesnt work with S3

2019-01-02 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732362#comment-16732362
 ] 

Steve Loughran commented on YARN-9148:
--

that's because directories don't actually exist: something just gets made up 
I'm afraid

> AggregatedLogDeletion doesnt work with S3
> -
>
> Key: YARN-9148
> URL: https://issues.apache.org/jira/browse/YARN-9148
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
>
> Aggregated Log deletion works based on modification time of application 
> directory
> S3AFileStatus give current time in case of directory.
> {code}
> if (appDir.isDirectory() &&
> appDir.getModificationTime() < cutoffMillis) {
>   ApplicationId appId = ApplicationId.fromString(
> {code}
> S3AFileStatus#getModificationTime
> {code}
>   @Override
>   public long getModificationTime(){
> if(isDirectory()){
>   return System.currentTimeMillis();
> } else {
>   return super.getModificationTime();
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9109) DelegationTokenRenewer not resilient to token classload problems

2018-12-11 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-9109:


 Summary: DelegationTokenRenewer not resilient to token classload 
problems
 Key: YARN-9109
 URL: https://issues.apache.org/jira/browse/YARN-9109
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.1.2
Reporter: Steve Loughran


The DelegationTokenRenewer can't handle the situation of: token implementation 
class cannot be instantiated.

This is because {{Token.decodeIdentifier()}} throws an RTE  on failure to 
instantiate, and these aren't caught.

Appears to surfaces in the client: with an error about "Timer Already 
Cancelled". In the logs, you see this
{code}
Exception: java.lang.NoClassDefFoundError thrown from the 
UncaughtExceptionHandler in thread "Timer-5"
{code}
At least, I'm assuming this is the cause Relates to HADOOP-14556



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9109) DelegationTokenRenewer not resilient to token classload problems

2018-12-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718016#comment-16718016
 ] 

Steve Loughran commented on YARN-9109:
--

Server
{code}
2018-12-11 16:32:17,429 [main] ERROR tools.DistCp (DistCp.java:run(167)) - 
Exception encountered 
java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to 
submit application_1543889597027_0032 to YARN : Timer already cancelled.
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:345)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:251)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1576)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1573)
at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:206)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:182)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:432)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
application_1543889597027_0032 to YARN : Timer already cancelled.
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:322)
at 
org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:299)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:330)
{code}

> DelegationTokenRenewer not resilient to token classload problems
> 
>
> Key: YARN-9109
> URL: https://issues.apache.org/jira/browse/YARN-9109
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.1.2
>Reporter: Steve Loughran
>Priority: Minor
>
> The DelegationTokenRenewer can't handle the situation of: token 
> implementation class cannot be instantiated.
> This is because {{Token.decodeIdentifier()}} throws an RTE  on failure to 
> instantiate, and these aren't caught.
> Appears to surfaces in the client: with an error about "Timer Already 
> Cancelled". In the logs, you see this
> {code}
> Exception: java.lang.NoClassDefFoundError thrown from the 
> UncaughtExceptionHandler in thread "Timer-5"
> {code}
> At least, I'm assuming this is the cause Relates to HADOOP-14556



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9057) CSI jar file should not bundle third party dependencies

2018-12-04 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708496#comment-16708496
 ] 

Steve Loughran commented on YARN-9057:
--

bq. it it seems to create more problems than the ones it fixed. 

Afraid so. 

General practise in hadoop-*: unshaded in all our cross references, moving to 
shaded for public artifacts (which we still need to do for the object stores). 
And we dream of a java9-only world...

> CSI jar file should not bundle third party dependencies
> ---
>
> Key: YARN-9057
> URL: https://issues.apache.org/jira/browse/YARN-9057
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Assignee: Weiwei Yang
>Priority: Blocker
>
> hadoop-yarn-csi-3.3.0-SNAPSHOT.jar bundles all third party classes like a 
> shaded jar instead of CSI only classes.  This is generating error messages 
> for YARN cli:
> {code}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-csi-3.3.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9057) CSI jar file should not bundle third party dependencies

2018-12-03 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-9057:
-
Priority: Blocker  (was: Major)

> CSI jar file should not bundle third party dependencies
> ---
>
> Key: YARN-9057
> URL: https://issues.apache.org/jira/browse/YARN-9057
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Priority: Blocker
>
> hadoop-yarn-csi-3.3.0-SNAPSHOT.jar bundles all third party classes like a 
> shaded jar instead of CSI only classes.  This is generating error messages 
> for YARN cli:
> {code}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-csi-3.3.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9057) CSI jar file should not bundle third party dependencies

2018-12-03 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707513#comment-16707513
 ] 

Steve Loughran commented on YARN-9057:
--

FWIW I'm not rebasing any of my ongoing work onto trunk until shading is 
optional. I'm building multiple times an hour, either the fill binary 
distribution or a set of JARs for use downstream. This adds minutes to the 
round trip time. Uprating to blocker

> CSI jar file should not bundle third party dependencies
> ---
>
> Key: YARN-9057
> URL: https://issues.apache.org/jira/browse/YARN-9057
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Priority: Major
>
> hadoop-yarn-csi-3.3.0-SNAPSHOT.jar bundles all third party classes like a 
> shaded jar instead of CSI only classes.  This is generating error messages 
> for YARN cli:
> {code}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-csi-3.3.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9057) CSI jar file should not bundle third party dependencies

2018-12-03 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-9057:
-
Affects Version/s: 3.3.0

> CSI jar file should not bundle third party dependencies
> ---
>
> Key: YARN-9057
> URL: https://issues.apache.org/jira/browse/YARN-9057
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Priority: Major
>
> hadoop-yarn-csi-3.3.0-SNAPSHOT.jar bundles all third party classes like a 
> shaded jar instead of CSI only classes.  This is generating error messages 
> for YARN cli:
> {code}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-csi-3.3.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9057) CSI jar file should not bundle third party dependencies

2018-12-03 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-9057:
-
Component/s: build

> CSI jar file should not bundle third party dependencies
> ---
>
> Key: YARN-9057
> URL: https://issues.apache.org/jira/browse/YARN-9057
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Priority: Major
>
> hadoop-yarn-csi-3.3.0-SNAPSHOT.jar bundles all third party classes like a 
> shaded jar instead of CSI only classes.  This is generating error messages 
> for YARN cli:
> {code}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-csi-3.3.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9057) CSI jar file should not bundle third party dependencies

2018-12-03 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707508#comment-16707508
 ] 

Steve Loughran commented on YARN-9057:
--

I agree. The shaded JAR must be a secondary -shaded artifact which is not 
created when the -DskipShade option is passed to the build

> CSI jar file should not bundle third party dependencies
> ---
>
> Key: YARN-9057
> URL: https://issues.apache.org/jira/browse/YARN-9057
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Eric Yang
>Priority: Major
>
> hadoop-yarn-csi-3.3.0-SNAPSHOT.jar bundles all third party classes like a 
> shaded jar instead of CSI only classes.  This is generating error messages 
> for YARN cli:
> {code}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/local/hadoop-3.3.0-SNAPSHOT/share/hadoop/yarn/hadoop-yarn-csi-3.3.0-SNAPSHOT.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8953) [CSI] CSI driver adaptor module support in NodeManager

2018-12-03 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707506#comment-16707506
 ] 

Steve Loughran commented on YARN-8953:
--

This is killing my latest trunk builds because the -DskipShade profile option 
is not respected.

I'm going to supply a patch to disable the shaded JAR and will be grateful if 
it were to be reviewed. Thanks

> [CSI] CSI driver adaptor module support in NodeManager
> --
>
> Key: YARN-8953
> URL: https://issues.apache.org/jira/browse/YARN-8953
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8953.001.patch, YARN-8953.002.patch, 
> YARN-8953.003.patch, YARN-8953.004.patch, YARN-8953.005.patch, 
> YARN-8953.006.patch, csi_adaptor_workflow.png
>
>
> CSI adaptor is a layer between YARN and CSI driver, it transforms YARN 
> internal concepts and boxes them according to CSI protocol. Then forward the 
> call to a CSI driver. The adaptor should support both 
> controller/node/identity services.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil

2018-11-27 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700670#comment-16700670
 ] 

Steve Loughran commented on YARN-4435:
--

Now I'm working with DTs more I'm in a better place to review this.

What do we have to do to get this in?

> Add RM Delegation Token DtFetcher Implementation for DtUtil
> ---
>
> Key: YARN-4435
> URL: https://issues.apache.org/jira/browse/YARN-4435
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, security, yarn
>Affects Versions: 3.0.0-alpha2
>Reporter: Matthew Paduano
>Assignee: Matthew Paduano
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-4435-003.patch, YARN-4435.00.patch.txt, 
> YARN-4435.01.patch, YARN-4435.02.patch, proposed_solution
>
>
> Add a class to yarn project that implements the DtFetcher interface to return 
> a RM delegation token object.  
> I attached a proposed class implementation that does this, but it cannot be 
> added as a patch until the interface is merged in HADOOP-12563



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-4721) RM to try to auth with HDFS on startup, retry with max diagnostics on failure

2018-11-27 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-4721.
--
Resolution: Won't Fix

Given up on this for now. If someone wants to take over -go for it

> RM to try to auth with HDFS on startup, retry with max diagnostics on failure
> -
>
> Key: YARN-4721
> URL: https://issues.apache.org/jira/browse/YARN-4721
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, security
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: oct16-medium
> Attachments: HADOOP-12289-002.patch, HADOOP-12289-003.patch, 
> HADOOP-12889-001.patch
>
>
> If the RM can't auth with HDFS, this can first surface during job submission, 
> which can cause confusion about what's wrong and whose credentials are 
> playing up.
> Instead, the RM could try to talk to HDFS on launch, {{ls /}} should suffice. 
> If it can't auth, it can then tell UGI to log more and retry.
> I don't know what the policy should be if the RM can't auth to HDFS at this 
> point. Certainly it can't currently accept work. But should it fail fast or 
> keep going in the hope that the problem is in the KDC or NN and will fix 
> itself without an RM restart?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2657) MiniYARNCluster to (optionally) add MicroZookeeper service

2018-11-27 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-2657.
--
Resolution: Won't Fix

been so long since I submitted this patch I'd forgotten I'd done it. Which 
means there's no real demand for it. closing as a wontfix

> MiniYARNCluster to (optionally) add MicroZookeeper service
> --
>
> Key: YARN-2657
> URL: https://issues.apache.org/jira/browse/YARN-2657
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: test
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: YARN-2567-001.patch, YARN-2657-002.patch
>
>
> This is needed for testing things like YARN-2646: add an option for the 
> {{MiniYarnCluster}} to start a {{MicroZookeeperService}}.
> This is just another YARN service to create and track the lifecycle. The 
> {{MicroZookeeperService}} publishes its binding information for direct takeup 
> by the registry services...this can address in-VM race conditions.
> The default setting for this service is "off"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-2646) distributed shell & tests to use registry

2018-11-27 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned YARN-2646:


Assignee: (was: Steve Loughran)

> distributed shell & tests to use registry
> -
>
> Key: YARN-2646
> URL: https://issues.apache.org/jira/browse/YARN-2646
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Steve Loughran
>Priority: Major
> Attachments: YARN-2646-001.patch, YARN-2646-003.patch
>
>
> for testing and for an example, the Distributed Shell should create a record 
> for itself in the service registry ... the tests can look for this. This will 
> act as a test for the RM integration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7747) YARN UI is broken in the minicluster

2018-10-08 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641818#comment-16641818
 ] 

Steve Loughran commented on YARN-7747:
--

bq. Steve Loughran we definitely need tests to prevent this kind of regression 
in the future. We could make sure that all web/http address keys are properly 
reflected in MiniYARNCLuster#getConfig implementation and then probe all of 
them through easy-to-validate REST api. RM URI should respond to the 
RM-specific REST, and so on and so forth

please. I've just been trying to this with the RM ports in a 
kerberized-cluster, and things didn't work out right w.r.t port bindings, 
kerberos principals and locahost vs hostname registration.


Maybe the probes should go  the MiniYARNCluster itself, some 
miniYarnCluster.get("/") call, so you could do things in tests like 
eventually() calls waiting for it to come up, etc.


> YARN UI is broken in the minicluster 
> -
>
> Key: YARN-7747
> URL: https://issues.apache.org/jira/browse/YARN-7747
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Major
> Attachments: YARN-7747.001.patch, YARN-7747.002.patch
>
>
> YARN web apps use non-injected instances of GuiceFilter, i.e. instances 
> created by Jetty as opposed by Guice itself. This triggers the [call 
> path|https://github.com/google/guice/blob/master/extensions/servlet/src/com/google/inject/servlet/GuiceFilter.java#L251]
>  where the static field {{pipeline}} is used instead of the instance field 
> {{injectedPipeline}}. However, besides GuiceFilter instances created by 
> Jetty, each Guice module generates them as well. On the injection call path 
> this static variable is updated by each instance. Thus if there are multiple 
> modules as it happens to be the case in the minicluster the one loaded last 
> ends up defining the filter pipeline for all Jetty instances. In the 
> minicluster case this is the nodemanager UI
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7747) YARN UI is broken in the minicluster

2018-10-03 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637567#comment-16637567
 ] 

Steve Loughran commented on YARN-7747:
--

* is there a way to get that UI URL back? If it's not there already?
* are there tests which look @ this, e.g. issue a GET or two?

> YARN UI is broken in the minicluster 
> -
>
> Key: YARN-7747
> URL: https://issues.apache.org/jira/browse/YARN-7747
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Major
> Attachments: YARN-7747.001.patch, YARN-7747.002.patch
>
>
> YARN web apps use non-injected instances of GuiceFilter, i.e. instances 
> created by Jetty as opposed by Guice itself. This triggers the [call 
> path|https://github.com/google/guice/blob/master/extensions/servlet/src/com/google/inject/servlet/GuiceFilter.java#L251]
>  where the static field {{pipeline}} is used instead of the instance field 
> {{injectedPipeline}}. However, besides GuiceFilter instances created by 
> Jetty, each Guice module generates them as well. On the injection call path 
> this static variable is updated by each instance. Thus if there are multiple 
> modules as it happens to be the case in the minicluster the one loaded last 
> ends up defining the filter pipeline for all Jetty instances. In the 
> minicluster case this is the nodemanager UI
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-06-15 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514105#comment-16514105
 ] 

Steve Loughran commented on YARN-8338:
--

BTW, maven shading on the hadoop minicluster is now complaining about .class 
conflict between objenesis and a mockito JAR, at least on a run of 
HADOOP-15407. 
{code}
[INFO] No artifact matching filter org.rocksdb:rocksdbjni
[WARNING] objenesis-1.0.jar, mockito-all-1.8.5.jar define 30 overlapping 
classes: 
[WARNING]   - org.objenesis.ObjenesisBase
[WARNING]   - org.objenesis.instantiator.gcj.GCJInstantiator
[WARNING]   - org.objenesis.ObjenesisHelper
[WARNING]   - org.objenesis.instantiator.jrockit.JRockitLegacyInstantiator
[WARNING]   - org.objenesis.instantiator.sun.SunReflectionFactoryInstantiator
[WARNING]   - org.objenesis.instantiator.ObjectInstantiator
[WARNING]   - org.objenesis.instantiator.gcj.GCJInstantiatorBase$DummyStream
[WARNING]   - org.objenesis.instantiator.basic.ObjectStreamClassInstantiator
[WARNING]   - org.objenesis.ObjenesisException
[WARNING]   - org.objenesis.Objenesis
[WARNING]   - 20 more...
[WARNING] maven-shade-plugin has detected that some class files are
[WARNING] present in two or more JARs. When this happens, only one
[WARNING] single version of the class is copied to the uber jar.
[WARNING] Usually this is not harmful and you can skip these warnings,
[WARNING] otherwise try to manually exclude artifacts based on
[WARNING] mvn dependency:tree -Ddetail=true and the above output.
[WARNING] See http://maven.apache.org/plugins/maven-shade-plugin/
{code}

I've done a scan for any mockito refs other than in test & can't see them, so 
I'm not sure what is happening. Something, somehow, is pulling in a mockito 
onto the hadoop-minicluster. Except I can't see it. Anyway, not sure how much 
it actually matters

> TimelineService V1.5 doesn't come up after HADOOP-15406
> ---
>
> Key: YARN-8338
> URL: https://issues.apache.org/jira/browse/YARN-8338
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8338.txt
>
>
> TimelineService V1.5 fails with the following:
> {code}
> java.lang.NoClassDefFoundError: org/objenesis/Objenesis
>   at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491657#comment-16491657
 ] 

Steve Loughran commented on YARN-8338:
--

no, let's declare the use of the one in use up until now, and we can cull the 
local dynamo tests for 3.2, which would match well to an objenesis update

> TimelineService V1.5 doesn't come up after HADOOP-15406
> ---
>
> Key: YARN-8338
> URL: https://issues.apache.org/jira/browse/YARN-8338
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: YARN-8338.txt
>
>
> TimelineService V1.5 fails with the following:
> {code}
> java.lang.NoClassDefFoundError: org/objenesis/Objenesis
>   at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406

2018-05-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488742#comment-16488742
 ] 

Steve Loughran commented on YARN-8338:
--

{code}
+-org.apache.hadoop:hadoop-aws:3.1.1-SNAPSHOT
  +-com.amazonaws:DynamoDBLocal:1.11.86
+-org.mockito:mockito-core:1.10.19
  +-org.objenesis:objenesis:2.1
{code}
you shouldn't be getting that at all as its a test only dependency. We also 
have a patch to remove that lib and tests of it completely, as while it was 
useful in the past, we've outgrown it HADOOP-14918

local DDB shouldn't be the cause for holding back on objenesis.

> TimelineService V1.5 doesn't come up after HADOOP-15406
> ---
>
> Key: YARN-8338
> URL: https://issues.apache.org/jira/browse/YARN-8338
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: YARN-8338.txt
>
>
> TimelineService V1.5 fails with the following:
> {code}
> java.lang.NoClassDefFoundError: org/objenesis/Objenesis
>   at 
> org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5306) Yarn should detect and fail fast on duplicate resources in container request

2018-04-16 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned YARN-5306:


Assignee: (was: Junping Du)

> Yarn should detect and fail fast on duplicate resources in container request
> 
>
> Key: YARN-5306
> URL: https://issues.apache.org/jira/browse/YARN-5306
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Yesha Vora
>Priority: Critical
>
> In some cases, Yarn gets duplicate copies of resources in resource-list. 
> In this case, you end up with a resource list which contains two copies of 
> resource JAR, with the timestamps of the two separate uploads —only one of 
> which (the later one) is correct. At download time, the NM goes through the 
> list and fails the download when it gets to the one with the older timestamp.
> We need some utility class to do a scan & check could be used by the NM at 
> download time (so fail with meaningful errors), and the yarn client could 
> perhaps do the check before launch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8053) Add hadoop-distcp in exclusion in hbase-server dependencies for timelineservice-hbase packages.

2018-03-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412001#comment-16412001
 ] 

Steve Loughran commented on YARN-8053:
--

we have a loop. Hadoop depends on HBase, HBase depends on Hadoop. So HBase is 
being built against an older version of Hadoop & its transitive versions 
(guava). It's less visible if you build everything yourself in one go, but the 
problem exists

w.r.t unwinding, does the hbase connector actually need to be built into 
hadoop: is there a way to create an extra project which pulls in both

> Add hadoop-distcp in exclusion in hbase-server dependencies for 
> timelineservice-hbase packages.
> ---
>
> Key: YARN-8053
> URL: https://issues.apache.org/jira/browse/YARN-8053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Fix For: 3.1.0, yarn-7055, 3.2.0
>
> Attachments: YARN-8053-YARN-7055.01.patch, YARN-8053.01.patch
>
>
> It is observed that when we change the version number of hadoop leading build 
> failure because of dependency resolution conflicts for HBase-2 compilation. 
> We see below error which tells that hbase-server has dependency on 
> hadoop-distcp. We also need to exclude hadoop-distcp from exclusion list. 
> {code}
> 07:42:36 2018/03/19 14:42:36 INFO: [ERROR] Failed to execute goal on 
> project hadoop-yarn-server-timelineservice-hbase-client: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-yarn-server-timelineservice-hbase-client:jar:3.3.0-SNAPSHOT:
>  Could not find artifact org.apache.hadoop:hadoop-distcp:jar:3.3.0-SNAPSHOT 
> in public 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8053) Add hadoop-distcp in exclusion in hbase-server dependencies for timelineservice-hbase packages.

2018-03-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411374#comment-16411374
 ] 

Steve Loughran commented on YARN-8053:
--

we seem to be suffering a losing battle by attempting to pull in a version of 
hbase built with an older version of hadoop. As well as hadoop artifacts 
getting in, we can't safely upgrade things. Is there any way to unwind this 
dependency by having {{hadoop-yarn-server-timelineservice-hbase}} be something 
which is downstream of both, because loops shouldn't be found in DAGs?

> Add hadoop-distcp in exclusion in hbase-server dependencies for 
> timelineservice-hbase packages.
> ---
>
> Key: YARN-8053
> URL: https://issues.apache.org/jira/browse/YARN-8053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Fix For: 3.1.0, yarn-7055, 3.2.0
>
> Attachments: YARN-8053-YARN-7055.01.patch, YARN-8053.01.patch
>
>
> It is observed that when we change the version number of hadoop leading build 
> failure because of dependency resolution conflicts for HBase-2 compilation. 
> We see below error which tells that hbase-server has dependency on 
> hadoop-distcp. We also need to exclude hadoop-distcp from exclusion list. 
> {code}
> 07:42:36 2018/03/19 14:42:36 INFO: [ERROR] Failed to execute goal on 
> project hadoop-yarn-server-timelineservice-hbase-client: Could not resolve 
> dependencies for project 
> org.apache.hadoop:hadoop-yarn-server-timelineservice-hbase-client:jar:3.3.0-SNAPSHOT:
>  Could not find artifact org.apache.hadoop:hadoop-distcp:jar:3.3.0-SNAPSHOT 
> in public 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6136) YARN registry service should avoid scanning whole ZK tree for every container/application finish

2018-03-07 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389632#comment-16389632
 ] 

Steve Loughran commented on YARN-6136:
--

It's just trying to do a cleanup at the end, no matter how things exit.

This could trivially be made optional

> YARN registry service should avoid scanning whole ZK tree for every 
> container/application finish
> 
>
> Key: YARN-6136
> URL: https://issues.apache.org/jira/browse/YARN-6136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
>
> In existing registry service implementation, purge operation triggered by 
> container finish event:
> {code}
>   public void onContainerFinished(ContainerId id) throws IOException {
> LOG.info("Container {} finished, purging container-level records",
> id);
> purgeRecordsAsync("/",
> id.toString(),
> PersistencePolicies.CONTAINER);
>   }
> {code} 
> Since this happens on every container finish, so it essentially scans all (or 
> almost) ZK node from the root. 
> We have a cluster which have hundreds of ZK nodes for service registry, and 
> have 20K+ ZK nodes for other purposes. The existing implementation could 
> generate massive ZK operations and internal Java objects (RegistryPathStatus) 
> as well. The RM becomes very unstable when there're batch container finish 
> events because of full GC pause and ZK connection failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7346) Add a profile to compile ATSv2 with HBase-2.0

2018-03-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381885#comment-16381885
 ] 

Steve Loughran commented on YARN-7346:
--

I am really excited about this if it lets us move off Guava 11 —I'd created a 
new "try to upgrade guava again" JIRA only yesterday: HADOOP-15272. 

* I'd like to see everything under control before it goes into trunk though; 
that's get javadocs and findbugs to shut up
* and something in /BUILDING

Presumably once this is in, setting the hbase-2 profile will allow the rest of 
Hadoop to build with guava.version=21.0? Has anyone tested this?


> Add a profile to compile ATSv2 with HBase-2.0
> -
>
> Key: YARN-7346
> URL: https://issues.apache.org/jira/browse/YARN-7346
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Ted Yu
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-7346.00.patch, YARN-7346.01.patch, 
> YARN-7346.02.patch, YARN-7346.03-incremental.patch, YARN-7346.03.patch, 
> YARN-7346.04-incremental.patch, YARN-7346.04.patch, YARN-7346.05.patch, 
> YARN-7346.06.patch, YARN-7346.07.patch, YARN-7346.08.patch, 
> YARN-7346.prelim1.patch, YARN-7346.prelim2.patch, YARN-7581.prelim.patch
>
>
> When compiling hadoop-yarn-server-timelineservice-hbase against 2.0.0-alpha3, 
> I got the following errors:
> https://pastebin.com/Ms4jYEVB
> This issue is to fix the compilation errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2127) Move YarnUncaughtExceptionHandler into Hadoop common

2017-08-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130221#comment-16130221
 ] 

Steve Loughran commented on YARN-2127:
--

This is an old patch. Yes, the Yarn one has been effectively superceded; the 
launcher one is a copy, paste & iteration of the YARN one, I seem to remember

> Move YarnUncaughtExceptionHandler into Hadoop common
> 
>
> Key: YARN-2127
> URL: https://issues.apache.org/jira/browse/YARN-2127
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Create a superclass of {{YarnUncaughtExceptionHandler}}  in the hadoop-common 
> code (retaining the original for compatibility).
> This would be available for any hadoop application to use, and the YARN-679 
> launcher could automatically set up the handler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2571) RM to support YARN registry

2017-07-12 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2571:
-
Attachment: YARN-2571-017.patch

Patch 017, sync up with trunk

Can I observe that this patch is approaching its third birthday. I would really 
like to see it in, so that you can use the registry in a secure cluster. I'm 
happy to accept feedback and rework it, but we need to get this in.

I need reviewers here. [~djp], are you able to look at it?

> RM to support YARN registry 
> 
>
> Key: YARN-2571
> URL: https://issues.apache.org/jira/browse/YARN-2571
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: oct16-hard
> Attachments: YARN-2571-001.patch, YARN-2571-002.patch, 
> YARN-2571-003.patch, YARN-2571-005.patch, YARN-2571-007.patch, 
> YARN-2571-008.patch, YARN-2571-009.patch, YARN-2571-010.patch, 
> YARN-2571-012.patch, YARN-2571-013.patch, YARN-2571-015.patch, 
> YARN-2571-016.patch, YARN-2571-017.patch
>
>
> The RM needs to (optionally) integrate with the YARN registry:
> # startup: create the /services and /users paths with system ACLs (yarn, hdfs 
> principals)
> # app-launch: create the user directory /users/$username with the relevant 
> permissions (CRD) for them to create subnodes.
> # attempt, container, app completion: remove service records with the 
> matching persistence and ID



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service

2017-04-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981332#comment-15981332
 ] 

Steve Loughran commented on YARN-679:
-

[~templedf]

bq. Signalled" is misspelled several times. Should be "signaled"

no, Signalled is spelt correctly. However, to avoid confusing people in the 
EN_US locale who contain invalid assumptions about which locale of the english 
language SHALL be considered normative, I shall change it.




> add an entry point that can start any Yarn service
> --
>
> Key: YARN-679
> URL: https://issues.apache.org/jira/browse/YARN-679
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf, 
> YARN-679-001.patch, YARN-679-002.patch, YARN-679-002.patch, 
> YARN-679-003.patch, YARN-679-004.patch, YARN-679-005.patch, 
> YARN-679-006.patch, YARN-679-007.patch, YARN-679-008.patch, 
> YARN-679-009.patch, YARN-679-010.patch, YARN-679-011.patch, YARN-679-013.patch
>
>  Time Spent: 72h
>  Remaining Estimate: 0h
>
> There's no need to write separate .main classes for every Yarn service, given 
> that the startup mechanism should be identical: create, init, start, wait for 
> stopped -with an interrupt handler to trigger a clean shutdown on a control-c 
> interrupt.
> Provide one that takes any classname, and a list of config files/options



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service

2017-04-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979867#comment-15979867
 ] 

Steve Loughran commented on YARN-679:
-

You know, if people want to see this @ work, I could back this up with a 
launcher of the YARN-913 registry DNS server

> add an entry point that can start any Yarn service
> --
>
> Key: YARN-679
> URL: https://issues.apache.org/jira/browse/YARN-679
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf, 
> YARN-679-001.patch, YARN-679-002.patch, YARN-679-002.patch, 
> YARN-679-003.patch, YARN-679-004.patch, YARN-679-005.patch, 
> YARN-679-006.patch, YARN-679-007.patch, YARN-679-008.patch, 
> YARN-679-009.patch, YARN-679-010.patch, YARN-679-011.patch, YARN-679-013.patch
>
>  Time Spent: 72h
>  Remaining Estimate: 0h
>
> There's no need to write separate .main classes for every Yarn service, given 
> that the startup mechanism should be identical: create, init, start, wait for 
> stopped -with an interrupt handler to trigger a clean shutdown on a control-c 
> interrupt.
> Provide one that takes any classname, and a list of config files/options



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-679) add an entry point that can start any Yarn service

2017-04-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979865#comment-15979865
 ] 

Steve Loughran commented on YARN-679:
-

javac warnings are all about use of Signal in IrqHandler. This is to be 
expected. By moving signal setup into one file, we can move other uses in the 
code to it, so providing one place for the complaints and making things set up 
in case Oracle do ever change the API

> add an entry point that can start any Yarn service
> --
>
> Key: YARN-679
> URL: https://issues.apache.org/jira/browse/YARN-679
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf, 
> YARN-679-001.patch, YARN-679-002.patch, YARN-679-002.patch, 
> YARN-679-003.patch, YARN-679-004.patch, YARN-679-005.patch, 
> YARN-679-006.patch, YARN-679-007.patch, YARN-679-008.patch, 
> YARN-679-009.patch, YARN-679-010.patch, YARN-679-011.patch, YARN-679-013.patch
>
>  Time Spent: 72h
>  Remaining Estimate: 0h
>
> There's no need to write separate .main classes for every Yarn service, given 
> that the startup mechanism should be identical: create, init, start, wait for 
> stopped -with an interrupt handler to trigger a clean shutdown on a control-c 
> interrupt.
> Provide one that takes any classname, and a list of config files/options



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-679) add an entry point that can start any Yarn service

2017-04-21 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-679:

Attachment: YARN-679-013.patch

Finally had time to sit down & go through all the review. Thanks dan! I have 
taken on *all* the javadoc suggestions, even the one about Signalled being 
misspelt, when that was clearly just an EN_US locale error. Easier to switch 
rather than to argue the details.

Code changes done, plus some other tweaks
* moved some public static methods to protected.
* fixed a test which was failing with the different exception text from the new 
Stax config parser.
* hit the "optimise imports" button in the IDE.

This patch only really makes sense when you implement a service which uses it. 
This is essentially what we've been using in Slider; once it is in I will use 
it for some of the S3A work as well as making it a standalone registry. This 
patch makes it trivial to use any Service either as a service or a standalone 
entry point, see

> add an entry point that can start any Yarn service
> --
>
> Key: YARN-679
> URL: https://issues.apache.org/jira/browse/YARN-679
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: org.apache.hadoop.servic...mon 3.0.0-SNAPSHOT API).pdf, 
> YARN-679-001.patch, YARN-679-002.patch, YARN-679-002.patch, 
> YARN-679-003.patch, YARN-679-004.patch, YARN-679-005.patch, 
> YARN-679-006.patch, YARN-679-007.patch, YARN-679-008.patch, 
> YARN-679-009.patch, YARN-679-010.patch, YARN-679-011.patch, YARN-679-013.patch
>
>  Time Spent: 72h
>  Remaining Estimate: 0h
>
> There's no need to write separate .main classes for every Yarn service, given 
> that the startup mechanism should be identical: create, init, start, wait for 
> stopped -with an interrupt handler to trigger a clean shutdown on a control-c 
> interrupt.
> Provide one that takes any classname, and a list of config files/options



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6414) ATSv2 tests fail due to guava version upgrade

2017-04-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952196#comment-15952196
 ] 

Steve Loughran commented on YARN-6414:
--

* test wise, it shows that actually the JAR update process needs to have more 
rigorous testing: the patch should go in for all the subprojects so yetus tests 
them all. I'll keep an eye on future changes there
* circular dependencies are generally considered "bad form" and shouldn't have 
happened. More specifically, the use of the acronym "DAG" in the maven 
dependency graph declares that the graph is "acyclic". Pulling in hbase as a 
dependency of hadoop is trying to do what spark/hive have achieved: created a 
cycle. Is there anyway to unwind the cycle so that there is an ATSvb2 server 
module independent of the others, which pulls in both. That way, it can take in 
any shaded guava libs from Hadoop & whatever HBase needs, while also allowing 
Hadoop to make progress on the plan to drop jersey 1 and so avoid version 
issues there ( HADOOP-13332 )

BTW, Hadoop may export Guava 11 but it's coded to not use classes cut out of 
later versions (e.g. guava dropping stopwatch (HADOOP-11032). Even so, the need 
to up the base guava version is a serious need, not just for downestream 
projects, but even because things like Curator which we pull in depends on 
later versions. Every upgrade of curator has problems related to guava  
HADOOP-11102 , HADOOP-11612

Side issue: In an ideal world. Guava would be backwards compatible, It isn't , 
and we get to deal with the pain. Whatever we do, something breaks. Oh, and 
then there's protobuf.



> ATSv2 tests fail due to guava version upgrade
> -
>
> Key: YARN-6414
> URL: https://issues.apache.org/jira/browse/YARN-6414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 3.0.0-alpha3
> Environment: Ubuntu 14.04 
> x86, ppc64le
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sonia Garudi
>Assignee: Haibo Chen
>  Labels: ppc64le, x86_64
> Attachments: YARN-6414.00.patch, YARN-6414.01.patch
>
>
> Test failures seen in Hadoop YARN Timeline Service HBase tests project with 
> following error :
> {code}
> java.lang.NoClassDefFoundError: com/google/common/io/LimitInputStream
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:223)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:913)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:899)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:722)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:660)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:279)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:955)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:700)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:529)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:585)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:751)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:735)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1407)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:998)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:869)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:704)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:642)
> at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:590)
> at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:987)
> at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:868)
> at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:862)
> at 
> 

[jira] [Updated] (YARN-6330) DefaultContainerExecutor container launch fails if script dest dir has a space in the path

2017-03-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-6330:
-
Environment: osx; no setsid command

> DefaultContainerExecutor container launch fails if script dest dir has a 
> space in the path
> --
>
> Key: YARN-6330
> URL: https://issues.apache.org/jira/browse/YARN-6330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0
> Environment: osx; no setsid command
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: YARN-6330-001.patch
>
>
> I know there's a workaround "don't do that"; this surfaced in IDE-executed 
> test runs.
> If you have a filesystem where the temp or final launcher script has a space 
> in the path, set up (echo, mv) fails. If you fix those command setups with 
> the relevant quotes, it fails in the setsuid script



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6330) DefaultContainerExecutor container launch fails if script dest dir has a space in the path

2017-03-13 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-6330:
-
Attachment: YARN-6330-001.patch

patch 001. Doesn't fix it completely; stops the two patched commands from 
failing, but the existing (quoted) exec command fails
{code}
2017-03-13 15:03:36,586 [main] INFO  mapreduce.Job 
(Job.java:monitorAndPrintJob(1434)) - Job job_1489417410136_0001 failed with 
state FAILED due to: Application application_1489417410136_0001 failed 2 times 
due to AM Container for appattempt_1489417410136_0001_02 exited with  
exitCode: 1
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1489417410136_0001_02_01
Exit code: 1
Exception message: /bin/bash: /Users/stevel/Projects/IDE: Is a directory

Stack trace: ExitCodeException exitCode=1: /bin/bash: 
/Users/stevel/Projects/IDE: Is a directory

at org.apache.hadoop.util.Shell.runCommand(Shell.java:994)
at org.apache.hadoop.util.Shell.run(Shell.java:887)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:293)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:421)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89)
{code}

Given it's already quoted there, I'm not sure what else is to be done. This is 
a mac BTW, setsid isn't present, so the command will be one of {{exec bash 
"temp.sh"}}

> DefaultContainerExecutor container launch fails if script dest dir has a 
> space in the path
> --
>
> Key: YARN-6330
> URL: https://issues.apache.org/jira/browse/YARN-6330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Priority: Minor
> Attachments: YARN-6330-001.patch
>
>
> I know there's a workaround "don't do that"; this surfaced in IDE-executed 
> test runs.
> If you have a filesystem where the temp or final launcher script has a space 
> in the path, set up (echo, mv) fails. If you fix those command setups with 
> the relevant quotes, it fails in the setsuid script



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6330) DefaultContainerExecutor container launch fails if script dest dir has a space in the path

2017-03-13 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-6330:


 Summary: DefaultContainerExecutor container launch fails if script 
dest dir has a space in the path
 Key: YARN-6330
 URL: https://issues.apache.org/jira/browse/YARN-6330
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.0
Reporter: Steve Loughran
Priority: Minor


I know there's a workaround "don't do that"; this surfaced in IDE-executed test 
runs.

If you have a filesystem where the temp or final launcher script has a space in 
the path, set up (echo, mv) fails. If you fix those command setups with the 
relevant quotes, it fails in the setsuid script



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2017-03-10 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-2031:
-
Attachment: YARN-2031-005.patch

Patch 005; patch 004 in sync with trunk, some minor review of imports.

As far as I am concerned, this code is now ready for review.

> YARN Proxy model doesn't support REST APIs in AMs
> -
>
> Key: YARN-2031
> URL: https://issues.apache.org/jira/browse/YARN-2031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2031-002.patch, YARN-2031-003.patch, 
> YARN-2031-004.patch, YARN-2031-005.patch, YARN-2031.patch.001
>
>
> AMs can't support REST APIs because
> # the AM filter redirects all requests to the proxy with a 302 response (not 
> 307)
> # the proxy doesn't forward PUT/POST/DELETE verbs
> Either the AM filter needs to return 307 and the proxy to forward the verbs, 
> or Am filter should not filter a REST bit of the web site



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2017-03-10 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904974#comment-15904974
 ] 

Steve Loughran commented on YARN-2031:
--

well, yes, it's still an issue. YARN doesn't proxy anything other than GET

> YARN Proxy model doesn't support REST APIs in AMs
> -
>
> Key: YARN-2031
> URL: https://issues.apache.org/jira/browse/YARN-2031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-2031-002.patch, YARN-2031-003.patch, 
> YARN-2031-004.patch, YARN-2031.patch.001
>
>
> AMs can't support REST APIs because
> # the AM filter redirects all requests to the proxy with a 302 response (not 
> 307)
> # the proxy doesn't forward PUT/POST/DELETE verbs
> Either the AM filter needs to return 307 and the proxy to forward the verbs, 
> or Am filter should not filter a REST bit of the web site



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4330) MiniYARNCluster is showing multiple Failed to instantiate default resource calculator warning messages.

2016-12-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746043#comment-15746043
 ] 

Steve Loughran commented on YARN-4330:
--

lets just go for 2.9; its an irritant in 2.8, but not much worse than all the 
other log messages

> MiniYARNCluster is showing multiple  Failed to instantiate default resource 
> calculator warning messages.
> 
>
> Key: YARN-4330
> URL: https://issues.apache.org/jira/browse/YARN-4330
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 2.8.0
> Environment: OSX, JUnit
>Reporter: Steve Loughran
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: oct16-hard
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-4330.002.patch, YARN-4330.003.patch, 
> YARN-4330.004.patch, YARN-4330.01.patch
>
>
> Whenever I try to start a MiniYARNCluster on Branch-2 (commit #0b61cca), I 
> see multiple stack traces warning me that a resource calculator plugin could 
> not be created
> {code}
> (ResourceCalculatorPlugin.java:getResourceCalculatorPlugin(184)) - 
> java.lang.UnsupportedOperationException: Could not determine OS: Failed to 
> instantiate default resource calculator.
> java.lang.UnsupportedOperationException: Could not determine OS
> {code}
> This is a minicluster. It doesn't need resource calculation. It certainly 
> doesn't need test logs being cluttered with even more stack traces which will 
> only generate false alarms about tests failing. 
> There needs to be a way to turn this off, and the minicluster should have it 
> that way by default.
> Being ruthless and marking as a blocker, because its a fairly major 
> regression for anyone testing with the minicluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5989) hadoop build to allow hadoop version property to be explicitly set (YARN test)

2016-12-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-5989:
-
Attachment: HADOOP-13582-003.patch

> hadoop build to allow hadoop version property to be explicitly set (YARN test)
> --
>
> Key: YARN-5989
> URL: https://issues.apache.org/jira/browse/YARN-5989
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: HADOOP-13582-003.patch
>
>
> Test the patch for HADOOP-13852 against YARN, as it has effects there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5989) hadoop build to allow hadoop version property to be explicitly set (YARN test)

2016-12-09 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-5989:
-
Priority: Minor  (was: Major)

> hadoop build to allow hadoop version property to be explicitly set (YARN test)
> --
>
> Key: YARN-5989
> URL: https://issues.apache.org/jira/browse/YARN-5989
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0-alpha2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-13582-003.patch
>
>
> Test the patch for HADOOP-13852 against YARN, as it has effects there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5989) hadoop build to allow hadoop version property to be explicitly set (YARN test)

2016-12-09 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-5989:


 Summary: hadoop build to allow hadoop version property to be 
explicitly set (YARN test)
 Key: YARN-5989
 URL: https://issues.apache.org/jira/browse/YARN-5989
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: build
Affects Versions: 3.0.0-alpha2
Reporter: Steve Loughran
Assignee: Steve Loughran


Test the patch for HADOOP-13852 against YARN, as it has effects there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5934) Fix TestTimelineWebServices.testPrimaryFilterNumericString

2016-12-03 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717993#comment-15717993
 ] 

Steve Loughran commented on YARN-5934:
--

is this just the test changing, or the actual behaviour of the web API itself?

> Fix TestTimelineWebServices.testPrimaryFilterNumericString
> --
>
> Key: YARN-5934
> URL: https://issues.apache.org/jira/browse/YARN-5934
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Attachments: YARN-5934.01.patch
>
>
> {noformat}
> Running org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices
> Tests run: 28, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 43.297 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices
> testPrimaryFilterNumericString(org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices)
>   Time elapsed: 1.209 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<3>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.timeline.webapp.TestTimelineWebServices.testPrimaryFilterNumericString(TestTimelineWebServices.java:348)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3477) TimelineClientImpl swallows exceptions

2016-12-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15714879#comment-15714879
 ] 

Steve Loughran commented on YARN-3477:
--

sorry, missed this.

I've closed the PR, and have resubmitted the .patch. If yetus has decided that 
it's staying in github mode, the workaround is to create a new PR with the 
latest patch. Alternatively, go through all the JIRA comments and remove refs 
to github

> TimelineClientImpl swallows exceptions
> --
>
> Key: YARN-3477
> URL: https://issues.apache.org/jira/browse/YARN-3477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: oct16-easy
> Attachments: YARN-3477-001.patch, YARN-3477-002.patch, 
> YARN-3477-trunk.003.patch, YARN-3477-trunk.004.patch, 
> YARN-3477-trunk.004.patch
>
>
> If timeline client fails more than the retry count, the original exception is 
> not thrown. Instead some runtime exception is raised saying "retries run out"
> # the failing exception should be rethrown, ideally via 
> NetUtils.wrapException to include URL of the failing endpoing
> # Otherwise, the raised RTE should (a) state that URL and (b) set the 
> original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3477) TimelineClientImpl swallows exceptions

2016-12-02 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3477:
-
Attachment: YARN-3477-trunk.004.patch

repost patch 4 for YARN to kick off (maybe)

> TimelineClientImpl swallows exceptions
> --
>
> Key: YARN-3477
> URL: https://issues.apache.org/jira/browse/YARN-3477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.6.0, 2.7.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>  Labels: oct16-easy
> Attachments: YARN-3477-001.patch, YARN-3477-002.patch, 
> YARN-3477-trunk.003.patch, YARN-3477-trunk.004.patch, 
> YARN-3477-trunk.004.patch
>
>
> If timeline client fails more than the retry count, the original exception is 
> not thrown. Instead some runtime exception is raised saying "retries run out"
> # the failing exception should be rethrown, ideally via 
> NetUtils.wrapException to include URL of the failing endpoing
> # Otherwise, the raised RTE should (a) state that URL and (b) set the 
> original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil

2016-11-24 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-4435:
-
Attachment: YARN-4435-003.patch

patch 003; finishing up this patch with my suggested changes

* move to SLF4J
* include URL in exceptions
* choose whether to use http/https from config

+some minor java 7-isms

> Add RM Delegation Token DtFetcher Implementation for DtUtil
> ---
>
> Key: YARN-4435
> URL: https://issues.apache.org/jira/browse/YARN-4435
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, security, yarn
>Affects Versions: 3.0.0-alpha2
>Reporter: Matthew Paduano
>Assignee: Matthew Paduano
>  Labels: oct16-medium
> Attachments: YARN-4435-003.patch, YARN-4435.00.patch.txt, 
> YARN-4435.01.patch, YARN-4435.02.patch, proposed_solution
>
>
> Add a class to yarn project that implements the DtFetcher interface to return 
> a RM delegation token object.  
> I attached a proposed class implementation that does this, but it cannot be 
> added as a patch until the interface is merged in HADOOP-12563



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5926) clean up registry code for java 7/8

2016-11-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15687370#comment-15687370
 ] 

Steve Loughran commented on YARN-5926:
--

[~ajisakaa] this is part of the registry cleanup; javadoc changes, stuff in the 
registry to go with the RM changes which I'll keep purely to YARN-2571. Tested 
locally

> clean up registry code for java 7/8
> ---
>
> Key: YARN-5926
> URL: https://issues.apache.org/jira/browse/YARN-5926
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-5926-001.patch
>
>
> Clean up the registry code to stop the java 7/8 warnings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5926) clean up registry code for java 7/8

2016-11-22 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-5926:
-
Attachment: YARN-5926-001.patch

Patch 001; cleanup. This is the yarn-registry code from patch 016 of YARN-2571; 
isolating to make the patch there smaller

> clean up registry code for java 7/8
> ---
>
> Key: YARN-5926
> URL: https://issues.apache.org/jira/browse/YARN-5926
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 2.7.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-5926-001.patch
>
>
> Clean up the registry code to stop the java 7/8 warnings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >