[jira] [Created] (YARN-9136) getNMResourceInfo NodeManager REST API method is not documented

2018-12-15 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9136:


 Summary: getNMResourceInfo NodeManager REST API method is not 
documented
 Key: YARN-9136
 URL: https://issues.apache.org/jira/browse/YARN-9136
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Alex Bodo


I cannot find documentation for the resources endpoint in NMWebServices: 
/ws/v1/node/resources/\{resourcename\}
I looked in the file NodeManagerRest.md for documentation but haven't found any.
This is supposedly unintentionally not documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9134) No test coverage for redefining FPGA / GPU resource types in TestResourceUtils

2018-12-15 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9134:


 Summary: No test coverage for redefining FPGA / GPU resource types 
in TestResourceUtils
 Key: YARN-9134
 URL: https://issues.apache.org/jira/browse/YARN-9134
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


The patch also includes some trivial code cleanup.
Also, setupResourceTypes has been deprecated as it is dangerous to use, see the 
javadoc for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9135) NM State store ResourceMappings serialization are tested with Strings instead of real Device objects

2018-12-15 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9135:


 Summary: NM State store ResourceMappings serialization are tested 
with Strings instead of real Device objects
 Key: YARN-9135
 URL: https://issues.apache.org/jira/browse/YARN-9135
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9133) Make tests more easy to comprehend in TestGpuResourceHandler

2018-12-15 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9133:


 Summary: Make tests more easy to comprehend in 
TestGpuResourceHandler
 Key: YARN-9133
 URL: https://issues.apache.org/jira/browse/YARN-9133
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Tests are not quite easy to read: 
- Some more helper methods would improve readability.
- Eliminating the boolean flag that controls if docker is used




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9127) Create more tests to verify GpuDeviceInformationParser

2018-12-14 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9127:


 Summary: Create more tests to verify GpuDeviceInformationParser
 Key: YARN-9127
 URL: https://issues.apache.org/jira/browse/YARN-9127
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9124) Resolve contradiction in ResourceUtils: addMandatoryResources / checkMandatoryResources work differently

2018-12-14 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9124:


 Summary: Resolve contradiction in ResourceUtils: 
addMandatoryResources / checkMandatoryResources work differently
 Key: YARN-9124
 URL: https://issues.apache.org/jira/browse/YARN-9124
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


{{ResourceUtils#addMandatoryResources}}: Adds only memory and vcores as 
mandatory resources.

{{ResourceUtils#checkMandatoryResources}}: YARN-6620 added some code to this. 
This method not only checks memory and vcores, but all the resources referred 
in ResourceInformation#MANDATORY_RESOURCES.

I think it would be good to call {{MANDATORY_RESOURCES}} as 
{{PREDEFINED_RESOURCES}} or something like that and use a similar name for 
{{checkMandatoryResources}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9123) Clean up and split testcases in TestNMWebServices for GPU support

2018-12-14 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9123:


 Summary: Clean up and split testcases in TestNMWebServices for GPU 
support
 Key: YARN-9123
 URL: https://issues.apache.org/jira/browse/YARN-9123
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


The following testcases can be cleaned up a bit: 
TestNMWebServices#testGetNMResourceInfo - Can be split up to 3 different cases
TestNMWebServices#testGetYarnGpuResourceInfo



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9121) Users of GpuDiscoverer.getInstance() are not possible to test as instance is a static field

2018-12-13 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9121:


 Summary: Users of GpuDiscoverer.getInstance() are not possible to 
test as instance is a static field
 Key: YARN-9121
 URL: https://issues.apache.org/jira/browse/YARN-9121
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9120) Need to have a way to turn off GPU auto-discovery in GpuDiscoverer

2018-12-13 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9120:


 Summary: Need to have a way to turn off GPU auto-discovery in 
GpuDiscoverer
 Key: YARN-9120
 URL: https://issues.apache.org/jira/browse/YARN-9120
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


GpuDiscoverer.getGpusUsableByYarn either parses the user-defined GPU devices or 
should have the value 'auto' (from property: 
yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices)
In some circumstances, users would want to exclude a node from scheduling, so 
they should have an option to turn off auto-discovery.
It's straightforward that this is possible by removing the GPU resource-plugin 
from YARN's config along with GPU-related config in container-executor.cfg, but 
doing that with a dedicated value for 
yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices is a more lightweight 
approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9119) Clean up testcase TestGpuDiscoverer.testLinuxGpuResourceDiscoverPluginConfig

2018-12-13 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9119:


 Summary: Clean up testcase 
TestGpuDiscoverer.testLinuxGpuResourceDiscoverPluginConfig
 Key: YARN-9119
 URL: https://issues.apache.org/jira/browse/YARN-9119
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


This testcase should be separated to 3 different testcases, as the comment says 
in the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9118) Handle issues with parsing user defined GPU devives in GpuDiscoverer

2018-12-13 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9118:


 Summary: Handle issues with parsing user defined GPU devives in 
GpuDiscoverer
 Key: YARN-9118
 URL: https://issues.apache.org/jira/browse/YARN-9118
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


getGpusUsableByYarn has the following issues: 
- Duplicate GPU device definitions are not denied: This seems to be the biggest 
issue as it could increase the number of devices on the node if the device ID 
is defined 2 or more times.
- An empty-string is accepted, it works like the user would not want to use 
auto-discovery and haven't defined any GPU devices: This will result in an 
empty device list, but the empty-string check is never explicitly there in the 
code, so this behavior just coincidental.
- Number validation does not happen on GPU device IDs (separated by commas)

Many testcases are added as the coverage was already very low.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9100) Add tests for GpuResourceAllocator and do minor code cleanup

2018-12-09 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9100:


 Summary: Add tests for GpuResourceAllocator and do minor code 
cleanup
 Key: YARN-9100
 URL: https://issues.apache.org/jira/browse/YARN-9100
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Add tests for GpuResourceAllocator and do minor code cleanup

- Improved log and exception messages
- Added some new debug logs
- Some methods are named like *Copy, these are returning copies of internal 
data structures. The word "copy" is just a noise in their name, so they have 
been renamed. Additionally, the copied data structures modified to be immutable.
- The waiting loop in method assignGpus were decoupled into a new class, 
RetryCommand. 

Some more words about the new class RetryCommand: 
There are some similar waiting loops in the code in: AMRMClient, 
AMRMClientAsync and even in GenericTestUtils (see waitFor method). RetryCommand 
could be a future replacement of these duplicated code, as it gives a solution 
to this waiting loop problem in a generic way.
The only downside of the usage of RetryCommand in GpuResourceAllocator 
(startGpuAssignmentLoop) is the ugly exception handling part, but that's solely 
because how Java deals with checked exceptions vs. lambdas. If there's a 
cleaner way to solve the exception handling, I'm open for any suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9099) GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way

2018-12-09 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9099:


 Summary: GpuResourceAllocator.getReleasingGpus calculates number 
of GPUs in a wrong way
 Key: YARN-9099
 URL: https://issues.apache.org/jira/browse/YARN-9099
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


getReleasingGpus plays an important role in the calculation which happens when 
GpuAllocator assign GPUs to a container, see: 
GpuResourceAllocator#internalAssignGpus.

If multiple GPUs are assigned to the same container, getReleasingGpus will 
return an invalid number.
The iterator goes over on mappings of (GPU device, container ID) and it 
retrieves the container by its ID the number of times the container ID is 
mapped to any device.
Then for every container, the resource value for the GPU resource is added to a 
running sum.
Obviously, if a container is mapped to 2 or more devices, then the container's 
GPU resource counter is added to the running sum as many times as the number of 
GPU devices the container has.

Example: 
Let's suppose {{usedDevices}} contains these mappings: 
- (GPU1, container1)
- (GPU2, container1)
- (GPU3, container2)

GPU resource value is 2 for container1 and 
GPU resource value is 1 for container2.
Then, if container1 is in a running state, getReleasingGpus will return 4 
instead of 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9098) Separate mtab file reader code and cgroups file system hierarchy parser code from CGroupsHandlerImpl and ResourceHandlerModule

2018-12-08 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9098:


 Summary: Separate mtab file reader code and cgroups file system 
hierarchy parser code from CGroupsHandlerImpl and ResourceHandlerModule
 Key: YARN-9098
 URL: https://issues.apache.org/jira/browse/YARN-9098
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Separate mtab file reader code and cgroups file system hierarchy parser code 
from CGroupsHandlerImpl and ResourceHandlerModule

CGroupsHandlerImpl has a method parseMtab that parses an mtab file and stores 
cgroups data.
CGroupsLCEResourcesHandler also has a method with the same name, with identical 
code.
The parser code should be extracted from these places and be added in a new 
class as this is a separate responsibility.
As the output of the file parser is a Map>, it's better to 
encapsulate it in a domain object, named 'CGroupsMountConfig' for instance.


ResourceHandlerModule has a method named parseConfiguredCGroupPath, that is 
responsible for producing the same results (Map>) to store 
cgroups data, it does not operate on mtab file, but looking at the filesystem 
for cgroup settings. As the output is the same, CGroupsMountConfig should be 
used here, too.
Again, this could should not be part of ResourceHandlerModule as it is a 
different responsibility.

One more thing which is strongly related to the methods above is 
CGroupsHandlerImpl.initializeFromMountConfig: This method processes the result 
of a parsed mtab file or a parsed cgroups filesystem data and stores file 
system paths for all available controllers. This method invokes 
findControllerPathInMountConfig, which is a duplicated in CGroupsHandlerImpl 
and CGroupsLCEResourcesHandler, so it should be moved to a single place. To 
store filesystem path and controller mappings, a new domain object could be 
introduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9097) Investigate why GpuDiscoverer methods are synchronized

2018-12-08 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9097:


 Summary: Investigate why GpuDiscoverer methods are synchronized
 Key: YARN-9097
 URL: https://issues.apache.org/jira/browse/YARN-9097
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Gergely Pollak


GpuDiscoverer.initialize surely shouldn't have been synchronized.
Please also investigate why getGpuDeviceInformation / getGpusUsableByYarn are 
synchronized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9096) Some GpuResourcePlugin and ResourcePluginManager methods are synchronized unnecessarily

2018-12-08 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9096:


 Summary: Some GpuResourcePlugin and ResourcePluginManager methods 
are synchronized unnecessarily
 Key: YARN-9096
 URL: https://issues.apache.org/jira/browse/YARN-9096
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


These methods are not used concurrently, they are part of the initialization 
code of NM that happens from one thread.

This is the list of the call hierarchies: 

1. GpuResourcePlugin.initialize + ResourcePluginManager.initialize

 
{code:java}
GpuResourcePlugin.initialize(Context) 
(org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu) 
ResourcePluginManager.initialize(Context) 
(org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin) 
NodeManager.serviceInit(Configuration) 
(org.apache.hadoop.yarn.server.nodemanager){code}
 

 

2. GpuResourcePlugin.createResourceHandler: 

 
{code:java}
GpuResourcePlugin.createResourceHandler(Context, CGroupsHandler, 
PrivilegedOperationExecutor) 
(org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu) 
ResourceHandlerModule.addHandlersFromConfiguredResourcePlugins(List, 
Configuration, Context) 
(org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources) 
ResourceHandlerModule.initializeConfiguredResourceHandlerChain(Configuration, 
Context) 
(org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources) 
ResourceHandlerModule.getConfiguredResourceHandlerChain(Configuration, Context) 
(org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources) 
ContainerScheduler.serviceInit(Configuration) 
(org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler) 
LinuxContainerExecutor.init(Context) (org.apache.hadoop.yarn.server.nodemanager)
{code}
 

3. GpuResourcePlugin.getNodeResourceHandlerInstance: 

 
{code:java}
GpuResourcePlugin.getNodeResourceHandlerInstance() 
(org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu)
NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(Resource)(2 usages) 
(org.apache.hadoop.yarn.server.nodemanager)
NodeStatusUpdaterImpl.serviceInit(Configuration) 
(org.apache.hadoop.yarn.server.nodemanager)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9095) Removed Unused field from Resource: NUM_MANDATORY_RESOURCES

2018-12-07 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9095:


 Summary: Removed Unused field from Resource: 
NUM_MANDATORY_RESOURCES
 Key: YARN-9095
 URL: https://issues.apache.org/jira/browse/YARN-9095
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


I suppose this constant remained in the code from historical reasons, but this 
is not used anymore so it could be removed.

This field is especially confusing for new readers, as ResourceInformation now 
has a field named MANDATORY_RESOURCES and this map contains not only memory and 
vcores but GPU and FPGA as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9094) Remove unused interface method: NodeResourceUpdaterPlugin#handleUpdatedResourceFromRM

2018-12-07 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9094:


 Summary: Remove unused interface method: 
NodeResourceUpdaterPlugin#handleUpdatedResourceFromRM
 Key: YARN-9094
 URL: https://issues.apache.org/jira/browse/YARN-9094
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


Additionally, there's a typo can be fixed in the javadoc of 

NodeResourceUpdaterPlugin#updateConfiguredResource: look for "mododule"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9093) Remove commented code block from the beginning of TestDefaultContainerExecutor

2018-12-07 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9093:


 Summary: Remove commented code block from the beginning of 
TestDefaultContainerExecutor
 Key: YARN-9093
 URL: https://issues.apache.org/jira/browse/YARN-9093
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9092) Create an object for cgroups mount enable and cgroups mount path as they belong together

2018-12-07 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9092:


 Summary: Create an object for cgroups mount enable and cgroups 
mount path as they belong together
 Key: YARN-9092
 URL: https://issues.apache.org/jira/browse/YARN-9092
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


YarnConfiguration.NM_LINUX_CONTAINER_CGROUPS_MOUNT and 

YarnConfiguration.NM_LINUX_CONTAINER_CGROUPS_MOUNT_PATH are used in conjunction 
many places in the code, so for the sake of readabilty and simplicity, it is 
better to wrap the values of these configs to an object and use it instead of 
having 2 fields in 

CGroupsHandlerImpl and in CgroupsLCEResourcesHandler as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9087) Better logging for initialization of Resource plugins

2018-12-06 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9087:


 Summary: Better logging for initialization of Resource plugins
 Key: YARN-9087
 URL: https://issues.apache.org/jira/browse/YARN-9087
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


The patch includes the following enahncements for logging: 
- Logging initializer code of resource handlers in 
{{LinuxContainerExecutor#init}}
- Logging initializer code of resource plugins in 
{{ResourcePluginManager#initialize}}
- Added toString to {{ResourceHandlerChain}}
- Added toString to all implementations to subclasses of {{ResourcePlugin}} as 
they are printed in {{ResourcePluginManager#initialize}}
- Added toString to all implementations to subclasses of {{ResourceHandler}} as 
they are printed as a field of the {{LinuxContainerExecutor#init}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9051) Integrate multiple CustomResourceTypesConfigurationProvider implementations into one

2018-11-23 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9051:


 Summary: Integrate multiple 
CustomResourceTypesConfigurationProvider implementations into one
 Key: YARN-9051
 URL: https://issues.apache.org/jira/browse/YARN-9051
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


CustomResourceTypesConfigurationProvider (extends LocalConfigurationProvider) 
has 5 implementations on trunk nowadays.
These could be integrated into 1 common class.
Also, 
{{org.apache.hadoop.yarn.util.resource.TestResourceUtils#addNewTypesToResources}}
 has similar functionality so this can be considered as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9035) Allow better troubleshooting of FS container assignments and lack of container assignments

2018-11-19 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9035:


 Summary: Allow better troubleshooting of FS container assignments 
and lack of container assignments
 Key: YARN-9035
 URL: https://issues.apache.org/jira/browse/YARN-9035
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


The call chain started from {{FairScheduler.attemptScheduling}}, to {{FSQueue}} 
(parent / leaf).assignContainer and down to {{FSAppAttempt#assignContainer}} 
has many calls and has many potential conditions where {{Resources.none()}} can 
be returned, meaning container is not allocated.
A bunch of these empty-assignments do not come with a debug log statement, so 
it's very hard to tell what condition lead the {{FairScheduler}} to a decision 
where containers are not allocated.
On top of that, in many places, it's difficult to tell either why a container 
was allocated to an app attempt.

The goal is to have a common place (i.e. class) that will do all the loggings, 
so users conveniently can control all the logs if they are curious why (and why 
not) container assigments happened.
Also, it would be handy if readers of the log could easily decide which 
{{AppAttempt}} is the log record created for, in other words: every log record 
should include the ID of the application / app attempt, if possible.

 

Details of implementation: 
As most of the already in-place debug messages were protected by a condition 
that checks whether the debug level is enabled on loggers, I followed a similar 
pattern. All the relevant log messages are created with the class 
{{ResourceAssignment}}. 
This class is a wrapper for the assigned {{Resource}} object and has a single 
logger, so clients should use its helper methods to create log records. There 
is a helper method called \{{shouldLogReservationActivity}} that checks if 
DEBUG or TRACE level is activated on the logger. 
See the javadoc on this class for further information.
{{}}

 

{{ResourceAssignment}} is also responsible for adding the app / appettempt ID 
to every log record (with some exceptions).
A couple of check classes are introduced: They are responsible to run and store 
results of checks that are dependency of a successful container allocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9025) Make TestFairScheduler#testChildMaxResources more reliable, as it is flaky now

2018-11-15 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9025:


 Summary: Make TestFairScheduler#testChildMaxResources more 
reliable, as it is flaky now
 Key: YARN-9025
 URL: https://issues.apache.org/jira/browse/YARN-9025
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


During making the code patch for YARN-8059, I come across a flaky test, see 
this link: 
https://builds.apache.org/job/PreCommit-YARN-Build/22412/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt

This is the error message: 
{code:java}
[ERROR] Tests run: 108, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.37 
s <<< FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
[ERROR] 
testChildMaxResources(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
 Time elapsed: 0.164 s <<< FAILURE!
java.lang.AssertionError: App 1 is not running with the correct number of 
containers expected:<2> but was:<0>
 at org.junit.Assert.fail(Assert.java:88){code}
So the thing is, even if we had 8 node updates, due to the nature of how we 
handle the events, it can happen that no container is allocated for the 
application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9024) ClusterNodeTracker maximum allocation does not respect resource units

2018-11-15 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9024:


 Summary: ClusterNodeTracker maximum allocation does not respect 
resource units
 Key: YARN-9024
 URL: https://issues.apache.org/jira/browse/YARN-9024
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


If a custom resource is defined with a default unit value (base unit) and a 
node reports its total capability in a different unit (e.g. M) then 
{{ClusterNodeTracker.getMaxAllowedAllocation}} returns the max allocation 
resource in the base unit, so the reported resource unit is not respected.

The issue is when the \{{updateMaxResources}} method is called (i.e. NM node is 
registered), the unit of the node's resources is not checked. In this method, 
we need to convert the reported value to the unit defined by RM for the 
individual resource types.

I also wanted to add a testcase where memory has G as its unit, but it was not 
possible easily without hacky code so I only added a testcase that verifies 
custom resource values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9019) Ratio of ResourceCaculator implementations could return NaN

2018-11-13 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9019:


 Summary: Ratio of ResourceCaculator implementations could return 
NaN
 Key: YARN-9019
 URL: https://issues.apache.org/jira/browse/YARN-9019
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8951) Defining default queue placement rule in allocations file with create="false" throws an NPE

2018-10-29 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8951:


 Summary: Defining default queue placement rule in allocations file 
with create="false" throws an NPE
 Key: YARN-8951
 URL: https://issues.apache.org/jira/browse/YARN-8951
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth
 Attachments: default-placement-rule-with-create-false.patch

If the default queue placement rule is defined with {{create="false"}} and a 
scheduling request is created for queue {{"root.default"}}, then 
{{FairScheduler#assignToQueue}} throws an NPE, while trying to construct an 
error message in the catch block of {{IllegalStateException}}, relying on the 
fact that the {{rmApp}} is not null but it is.

Example of such a config file:
{code:java}



1024mb,0vcores





{code}
This is suspicious, as there are some null checks for {{rmApp}} in the same 
method.
 Not sure if this is a special case for the tests or it is reproducable in a 
cluster, this needs further investigation.

In any case, it's not good that we try to dereference the {{rmApp}} that is 
null.

On the other hand, I'm not sure if the default queue placement rule with 
{{create="false"}} makes sense at all. Looking at the documentation 
([https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html):]
{quote}default: the app is placed into the queue specified in the ‘queue’ 
attribute of the default rule. *If ‘queue’ attribute is not specified, the app 
is placed into ‘root.default’ queue.*

A queuePlacementPolicy element: which contains a list of rule elements that 
tell the scheduler how to place incoming apps into queues. Rules are applied in 
the order that they are listed. Rules may take arguments. *All rules accept the 
“create” argument, which indicates whether the rule can create a new queue. 
“Create” defaults to true; if set to false and the rule would place the app in 
a queue that is not configured in the allocations file, we continue on to the 
next rule.* The last rule must be one that can never issue a continue
{quote}
In this case, the rule has the queue property suppressed so the apps should be 
placed to the {{root.default}} queue (which is an undefined queue according to 
the config file), and create is false, meaning that the queue {{root.default}} 
cannot be created at all.

*This seems to be a case of an invalid queue configuration file for me.*

[~jlowe], [~leftnoteasy]: What is your take on this?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8842) Update QueueMetrics with custom resource values

2018-10-03 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8842:


 Summary: Update QueueMetrics with custom resource values 
 Key: YARN-8842
 URL: https://issues.apache.org/jira/browse/YARN-8842
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


This is the 2nd dependent jira of YARN-8059.
As updating the metrics is an independent step from handling preemption, this 
jira only deals with the queue metrics update of custom resources.
The following metrics should be updated: 
* allocated resources
* available resources
* pending resources
* reserved resources
* aggregate seconds preempted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8841) Analyze if ApplicationMasterService schedule tests can be applied to all scheduler types

2018-10-02 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8841:


 Summary: Analyze if ApplicationMasterService schedule tests can be 
applied to all scheduler types
 Key: YARN-8841
 URL: https://issues.apache.org/jira/browse/YARN-8841
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


This is a follow-up jira of YARN-8732.

1. testResourceTypes() checks all three schedulers, fifo, capacity scheduler 
and fair scheduler. How about we split them into three classes respectively, 
even though it might mean some code duplication?

2. testUpdateTrackingUrl() is now run for capacity scheduler only. I think we 
shall run it with all three schedulers. So is  
testInvalidIncreaseDecreaseRequest() in theory (If the other two schedulers do 
not support increase/decrease requests, let's keep it with capacity scheduler 
only.

3. All the unit tests in ApplicationMasterServiceTestBase are applicable to all 
three schedulers, but we are just running them with Fifo scheduler. We should 
probably enable them for capacity and fair scheduler too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8782) Fix exception message in Resource.throwExceptionWhenArrayOutOfBound

2018-09-17 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8782:


 Summary: Fix exception message in 
Resource.throwExceptionWhenArrayOutOfBound 
 Key: YARN-8782
 URL: https://issues.apache.org/jira/browse/YARN-8782
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


The exception message contains "please check double check".
This needs to be fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8750) Refactor TestQueueMetrics

2018-09-06 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8750:


 Summary: Refactor TestQueueMetrics
 Key: YARN-8750
 URL: https://issues.apache.org/jira/browse/YARN-8750
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


{{TestQueueMetrics#checkApps}} and {{TestQueueMetrics#checkResources}} have 8 
and 14 parameters, respectively.
It is very hard to read the testcases that are using these methods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8732) Create new testcase in TestApplicationMasterService that tests min/max allocation but for FairScheduler

2018-08-30 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8732:


 Summary: Create new testcase in TestApplicationMasterService that 
tests min/max allocation but for FairScheduler
 Key: YARN-8732
 URL: https://issues.apache.org/jira/browse/YARN-8732
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Create testcase like this, but for FS: 
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService#testValidateRequestCapacityAgainstMinMaxAllocationFor3rdResourceTypes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8644) Make RMAppImpl$FinalTransition more readable + add more test coverage

2018-08-09 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8644:


 Summary: Make RMAppImpl$FinalTransition more readable + add more 
test coverage
 Key: YARN-8644
 URL: https://issues.apache.org/jira/browse/YARN-8644
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8621) Add REST API tests for Resource Types fields

2018-08-03 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8621:


 Summary: Add REST API tests for Resource Types fields
 Key: YARN-8621
 URL: https://issues.apache.org/jira/browse/YARN-8621
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


This is a complement for YARN-7451 that already added unit tests for the apps 
and scheduler endpoints.
The following API endpoints should be tested as well:
/ws/v1/cluster/apps/
/ws/v1/cluster/apps//appattempts
/ws/v1/cluster/apps//appattempts/ 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8616) System.currentTimeMillis() used in RMAppImpl, instead of getting value from systemClock

2018-08-02 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8616:


 Summary: System.currentTimeMillis() used in RMAppImpl, instead of 
getting value from systemClock
 Key: YARN-8616
 URL: https://issues.apache.org/jira/browse/YARN-8616
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8586) Extract log aggregation related fields and methods from RMAppImpl

2018-07-26 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8586:


 Summary: Extract log aggregation related fields and methods from 
RMAppImpl
 Key: YARN-8586
 URL: https://issues.apache.org/jira/browse/YARN-8586
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Given that RMAppImpl is already above 2000 lines and it is very complex, as a 
very simple 
and straightforward step, all Log aggregation related fields and methods could 
be extracted to a new class.
The clients of RMAppImpl may access the same methods and RMAppImpl would 
delegate all those calls to the newly introduced class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8585) Add test class for DefaultAMSProcessor

2018-07-26 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8585:


 Summary: Add test class for DefaultAMSProcessor
 Key: YARN-8585
 URL: https://issues.apache.org/jira/browse/YARN-8585
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Since this class has no test coverage at all, it seems to be a good idea to 
test it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8584) Several typos in Log Aggregation related classes

2018-07-26 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8584:


 Summary: Several typos in Log Aggregation related classes
 Key: YARN-8584
 URL: https://issues.apache.org/jira/browse/YARN-8584
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


There are typos in comments, log messages, method names, field names, etc.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8566) Add diagnostic message for unschedulable containers

2018-07-23 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8566:


 Summary: Add diagnostic message for unschedulable containers
 Key: YARN-8566
 URL: https://issues.apache.org/jira/browse/YARN-8566
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


If a queue is configured with maxResources set to 0 for a resource, and an 
application is submitted to that queue that requests that resource, that 
application will remain pending until it is removed or moved to a different 
queue. This behavior can be realized without extended resources, but it’s 
unlikely a user will create a queue that allows 0 memory or CPU. As the number 
of resources in the system increases, this scenario will become more common, 
and it will become harder to recognize these cases. Therefore, the scheduler 
should indicate in the diagnostic string for an application if it was not 
scheduled because of a 0 maxResources setting.

Example configuration (fair-scheduler.xml) : 

{code:java}

  10

1 mb,2vcores
9 mb,4vcores, 0gpu
50
-1.0f
2.0
fair
  


{code}

Command: 

{code:java}
yarn jar 
"./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0-SNAPSHOT.jar" pi 
-Dmapreduce.job.queuename=sample_queue -Dmapreduce.map.resource.gpu=1 1 1000;
{code}

The job hangs and the application diagnostic info is empty.
Given that an exception is thrown before any mapper/reducer container is 
created, the diagnostic message of the AM should be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8524) Single parameter Resource / LightWeightResource constructor looks confusing

2018-07-12 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8524:


 Summary: Single parameter Resource / LightWeightResource 
constructor looks confusing
 Key: YARN-8524
 URL: https://issues.apache.org/jira/browse/YARN-8524
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


The single parameter (long) constructor in Resource / LightWeightResource sets 
all resource components to the same value.
Since there are other constructors in these classes with (long, int) parameters 
where the semantics are different, it could be confusing for the users.
The perfect place to create such a resource would be in the Resources class, 
with a method named like "createResourceWithSameValue".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8517) getContainer and getContainers ResourceManager REST API methods are not documented

2018-07-11 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8517:


 Summary: getContainer and getContainers ResourceManager REST API 
methods are not documented
 Key: YARN-8517
 URL: https://issues.apache.org/jira/browse/YARN-8517
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Szilard Nemeth


Looking at the documentation here: 
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
I cannot find documentation for 2 RM REST endpoints: 
- /apps/{appid}/appattempts/{appattemptid}/containers
- /apps/{appid}/appattempts/{appattemptid}/containers/{containerid}
I suppose they are not intentionally undocumented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8502) Use path strings consistently for webservice endpoints in RMWebServices

2018-07-07 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8502:


 Summary: Use path strings consistently for webservice endpoints in 
RMWebServices
 Key: YARN-8502
 URL: https://issues.apache.org/jira/browse/YARN-8502
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Currently there are 2 types of endpoint path definitions: 
1. with string, example: 
@Path("/apps/{appid}/appattempts/{appattemptid}/containers/{containerid}")
2. with constant, example: 
@Path(RMWSConsts.APPS_APPID_APPATTEMPTS_APPATTEMPTID_CONTAINERS)

Most preferably, constants should be used for all Paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8501) Decrease complexity of RMWebServices' getApps method

2018-07-07 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8501:


 Summary: Decrease complexity of RMWebServices' getApps method
 Key: YARN-8501
 URL: https://issues.apache.org/jira/browse/YARN-8501
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: restapi
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8442) Strange characters and missing spaces in FairScheduler documentation

2018-06-19 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8442:


 Summary: Strange characters and missing spaces in FairScheduler 
documentation
 Key: YARN-8442
 URL: https://issues.apache.org/jira/browse/YARN-8442
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


[https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/FairScheduler.html]

There are several missing spaces and strange characters in: 

Allocation file format / queuePlacementPolicy element / nestedUserQueue

Quoting the wrong part of the document: 
{code:java}
This is similar to ‘user’ rule,the difference being in ‘nestedUserQueue’ 
rule,user...
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8441) Typo in CSQueueUtils local variable names: queueGuranteedResource

2018-06-19 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8441:


 Summary: Typo in CSQueueUtils local variable names: 
queueGuranteedResource
 Key: YARN-8441
 URL: https://issues.apache.org/jira/browse/YARN-8441
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8440) Typo in YarnConfiguration javadoc: "Miniumum request grant-able.."

2018-06-19 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8440:


 Summary: Typo in YarnConfiguration javadoc: "Miniumum request 
grant-able.."
 Key: YARN-8440
 URL: https://issues.apache.org/jira/browse/YARN-8440
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8439) Typos in test names in TestTaskAttempt: "testAppDiognostic"

2018-06-19 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8439:


 Summary: Typos in test names in TestTaskAttempt: 
"testAppDiognostic"
 Key: YARN-8439
 URL: https://issues.apache.org/jira/browse/YARN-8439
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth


These two methods need to be renamed: 
 * testAppDiognosticEventOnUnassignedTask
 * testAppDiognosticEventOnNewTask



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8438) TestContainer.testKillOnNew flaky on trunk

2018-06-19 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8438:


 Summary: TestContainer.testKillOnNew flaky on trunk
 Key: YARN-8438
 URL: https://issues.apache.org/jira/browse/YARN-8438
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Running this test several times (e.g. 30), it fails ~5-10 times.

Stacktrace: 
{code:java}
java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at 
org.junit.Assert.assertTrue(Assert.java:41) at 
org.junit.Assert.assertTrue(Assert.java:52) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.TestContainer.testKillOnNew(TestContainer.java:594)
{code}
TestContainer:594 is the following code in trunk, currently:
{code:java}
Assert.assertTrue( containerMetrics.finishTime.value() > 
containerMetrics.startTime .value());
{code}
So sometimes the finish time is not greater than the start time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8248) Job hangs when queue is specified and that queue has 0 capability of a resource

2018-05-04 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8248:


 Summary: Job hangs when queue is specified and that queue has 0 
capability of a resource
 Key: YARN-8248
 URL: https://issues.apache.org/jira/browse/YARN-8248
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


Job hangs when mapreduce.job.queuename is specified and the queue has 0 of any 
resource (vcores / memory / other)
{code:java}
bin/yarn jar 
"./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi 
-Dmapreduce.job.queuename=sample_queue 1 1000;{code}
fair-scheduler.xml queue config (excerpt):

 
{code:java}
1 mb,0vcores
9 mb,0vcores
{code}
Diagnostic message from the web UI: 
{code:java}
Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
not yet activated. (Resource request:  exceeds current 
queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8202) DefaultAMSProcessor should properly check units of requested custom resource types against minimum/maximum allocation

2018-04-24 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-8202:


 Summary: DefaultAMSProcessor should properly check units of 
requested custom resource types against minimum/maximum allocation
 Key: YARN-8202
 URL: https://issues.apache.org/jira/browse/YARN-8202
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7841) Cleanup AllocationFileLoaderService's reloadAllocations method

2018-01-29 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-7841:


 Summary: Cleanup AllocationFileLoaderService's reloadAllocations 
method
 Key: YARN-7841
 URL: https://issues.apache.org/jira/browse/YARN-7841
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Affects Versions: 3.0.0
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


AllocationFileLoaderService's reloadAllocations method is too complex. 
Please refactor / cleanup this method to be more simple to understand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7528) Resource types that use units need to be defined at RM level and NM level or when using small units you will overflow max_allocation calculation

2018-01-26 Thread Szilard Nemeth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved YARN-7528.
--
Resolution: Cannot Reproduce

> Resource types that use units need to be defined at RM level and NM level or 
> when using small units you will overflow max_allocation calculation
> 
>
> Key: YARN-7528
> URL: https://issues.apache.org/jira/browse/YARN-7528
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, resourcemanager
>Affects Versions: 3.0.0
>Reporter: Grant Sohn
>Assignee: Szilard Nemeth
>Priority: Major
>
> When the unit is not defined in the RM, the LONG_MAX default will overflow in 
> the conversion step.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



<    1   2   3