[jira] [Created] (YARN-7131) FSDownload.unpack should read determine the type of resource by reading the header bytes
Brook Zhou created YARN-7131: Summary: FSDownload.unpack should read determine the type of resource by reading the header bytes Key: YARN-7131 URL: https://issues.apache.org/jira/browse/YARN-7131 Project: Hadoop YARN Issue Type: Improvement Reporter: Brook Zhou Assignee: Brook Zhou Currently, there are naive string checks to determine if a resource of a particular type (jar, zip, tar.gz) There can be cases where this does not work - e.g., the user decides to split up a large zip resource as {file1}.zip.001, {file1}.zip.002. Instead, FSDownload.unpack should read the file header bytes to determine the file type. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7098) LocalizerRunner should immediately send heartbeat response LocalizerStatus.DIE when the Container transitions from LOCALIZING to KILLING
Brook Zhou created YARN-7098: Summary: LocalizerRunner should immediately send heartbeat response LocalizerStatus.DIE when the Container transitions from LOCALIZING to KILLING Key: YARN-7098 URL: https://issues.apache.org/jira/browse/YARN-7098 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Brook Zhou Assignee: Brook Zhou Priority: Minor Currently, the following can happen: 1. ContainerLocalizer heartbeats to ResourceLocalizationService. 2. LocalizerTracker.processHeartbeat verifies that there is a LocalizerRunner for the localizerId (containerId). 3. Container receives kill event, goes from LOCALIZING -> KILLING. The LocalizerRunner for the localizerId is removed from LocalizerTracker. 4. Since check (2) passed, LocalizerRunner sends heartbeat response with LocalizerStatus.LIVE and the next file to download. What should happen here is that (4) sends a LocalizerStatus.DIE, since (3) happened before the heartbeat response in (4). This saves the container from potentially downloading an extra resource which will end up being deleted anyway. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6870) ResourceUtilization/ContainersMonitorImpl is calculating CPU utilization as a float, which is imprecise
Brook Zhou created YARN-6870: Summary: ResourceUtilization/ContainersMonitorImpl is calculating CPU utilization as a float, which is imprecise Key: YARN-6870 URL: https://issues.apache.org/jira/browse/YARN-6870 Project: Hadoop YARN Issue Type: Bug Components: api, nodemanager Reporter: Brook Zhou Assignee: Brook Zhou We have seen issues on our clusters where the current way of computing CPU usage is having float-arithmetic inaccuracies (the bug is still there in trunk) Simple program to illustrate: {code:title=Bar.java|borderStyle=solid} public static void main(String[] args) throws Exception { float result = 0.0f; for (int i = 0; i < 7; i++) { if (i == 6) { result += (float) 4 / (float)18; } else { result += (float) 2 / (float)18; } } for (int i = 0; i < 7; i++) { if (i == 6) { result -= (float) 4 / (float)18; } else { result -= (float) 2 / (float)18; } } System.out.println(result); } {code} // Printed 4.4703484E-8 2017-04-12 05:43:24,014 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Not enough cpu for [container_e3295_1491978508342_0467_01_30], Current CPU Allocation: [0.891], Requested CPU Allocation: [0.] There are a few places with this issue: 1. ResourceUtilization.java - set/getCPU both use float. When ContainerScheduler calls ContainersMonitor.increase/decreaseResourceUtilization, this may lead to issues. 2. AllocationBasedResourceUtilizationTracker.java - hasResourcesAvailable uses float as well for CPU computation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-5472) WIN_MAX_PATH logic is off by one
Brook Zhou created YARN-5472: Summary: WIN_MAX_PATH logic is off by one Key: YARN-5472 URL: https://issues.apache.org/jira/browse/YARN-5472 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Environment: Windows Reporter: Brook Zhou Assignee: Brook Zhou Priority: Minor The following check is incorrect: if (Shell.WINDOWS && sb.getWrapperScriptPath().toString().length() > WIN_MAX_PATH) should be >=, as the max path is defined as "D:\some 256-character path string" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-4840) Add option to upload files recursively from container directory
Brook Zhou created YARN-4840: Summary: Add option to upload files recursively from container directory Key: YARN-4840 URL: https://issues.apache.org/jira/browse/YARN-4840 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation Affects Versions: 2.8.0 Reporter: Brook Zhou Priority: Minor Fix For: 2.8.0 It may be useful to allow users to aggregate their logs recursively from container directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4818) AggregatedLogFormat.LogValue.write() incorrectly truncates files
[ https://issues.apache.org/jira/browse/YARN-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brook Zhou resolved YARN-4818. -- Resolution: Invalid > AggregatedLogFormat.LogValue.write() incorrectly truncates files > > > Key: YARN-4818 > URL: https://issues.apache.org/jira/browse/YARN-4818 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Brook Zhou >Assignee: Brook Zhou > Labels: log-aggregation > Fix For: 2.8.0 > > Attachments: YARN-4818-v0.patch > > > AggregatedLogFormat.LogValue.write() currently has a bug where it only writes > in blocks of the buffer size (65535). This is because > FileInputStream.read(byte[] buf) returns -1 if there are less than buf.length > bytes remaining. In cases where the file size is not an exact multiple of > 65535 bytes, the remaining bytes are truncated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4818) AggregatedLogFormat.LogValue writes only in blocks of buffer size
Brook Zhou created YARN-4818: Summary: AggregatedLogFormat.LogValue writes only in blocks of buffer size Key: YARN-4818 URL: https://issues.apache.org/jira/browse/YARN-4818 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.8.0 Reporter: Brook Zhou Assignee: Brook Zhou Fix For: 2.8.0 AggregatedLogFormat.LogValue.write() currently has a bug where it only writes in blocks of the buffer size (65535). This is because FileInputStream.read(byte[] buf) returns -1 if there are less than 65535 bytes remaining. In cases where the file is less than 65535 bytes, 0 bytes are written. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4677) RMNodeResourceUpdateEvent update from scheduler can lead to race condition
Brook Zhou created YARN-4677: Summary: RMNodeResourceUpdateEvent update from scheduler can lead to race condition Key: YARN-4677 URL: https://issues.apache.org/jira/browse/YARN-4677 Project: Hadoop YARN Issue Type: Improvement Components: graceful, resourcemanager, scheduler Affects Versions: 2.7.1 Reporter: Brook Zhou When a node is in decommissioning state, there is time window between completedContainer() and RMNodeResourceUpdateEvent get handled in scheduler.nodeUpdate (YARN-3223). So if a scheduling effort happens within this window, the new container could still get allocated on this node. Even worse case is if scheduling effort happen after RMNodeResourceUpdateEvent sent out but before it is propagated to SchedulerNode - then the total resource is lower than used resource and available resource is a negative value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)