[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508557#comment-14508557 ] Zhijie Shen commented on YARN-3522: --- I took a look at checkstyle errors and commented on [HADOOP-11869|https://issues.apache.org/jira/browse/HADOOP-11869?focusedCommentId=14508555&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14508555]. It seems more like noise now. > DistributedShell uses the wrong user to put timeline data > - > > Key: YARN-3522 > URL: https://issues.apache.org/jira/browse/YARN-3522 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-3522.1.patch, YARN-3522.2.patch, YARN-3522.3.patch > > > YARN-3287 breaks the timeline access control of distributed shell. In > distributed shell AM: > {code} > if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, > YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { > // Creating the Timeline Client > timelineClient = TimelineClient.createTimelineClient(); > timelineClient.init(conf); > timelineClient.start(); > } else { > timelineClient = null; > LOG.warn("Timeline service is not enabled"); > } > {code} > {code} > ugi.doAs(new PrivilegedExceptionAction() { > @Override > public TimelinePutResponse run() throws Exception { > return timelineClient.putEntities(entity); > } > }); > {code} > YARN-3287 changes the timeline client to get the right ugi at serviceInit, > but DS AM still doesn't use submitter ugi to init timeline client, but use > the ugi for each put entity call. It result in the wrong user of the put > request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case
[ https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508539#comment-14508539 ] Hadoop QA commented on YARN-3405: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727548/YARN-3405.02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 18eb5e7 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7468/console | This message was automatically generated. > FairScheduler's preemption cannot happen between sibling in some case > - > > Key: YARN-3405 > URL: https://issues.apache.org/jira/browse/YARN-3405 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: YARN-3405.01.patch, YARN-3405.02.patch > > > Queue hierarchy described as below: > {noformat} > root >/ \ >queue-1 queue-2 > / \ > queue-1-1 queue-1-2 > {noformat} > Assume cluster resource is 100 > # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. > # When queue-1-2 is active, and it cause some new preemption request for > fairshare 25. > # When preemption from root, it has possibility to find preemption candidate > is queue-2. If so preemptContainerPreCheck for queue-2 return false because > it's equal to its fairshare. > # Finally queue-1-2 will be waiting for resource release form queue-1-1 > itself. > What I expect here is that queue-1-2 preempt from queue-1-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case
[ https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508536#comment-14508536 ] Peng Zhang commented on YARN-3405: -- Update patch: only preempt from children when queue is not starved and add test case. > FairScheduler's preemption cannot happen between sibling in some case > - > > Key: YARN-3405 > URL: https://issues.apache.org/jira/browse/YARN-3405 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: YARN-3405.01.patch, YARN-3405.02.patch > > > Queue hierarchy described as below: > {noformat} > root >/ \ >queue-1 queue-2 > / \ > queue-1-1 queue-1-2 > {noformat} > Assume cluster resource is 100 > # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. > # When queue-1-2 is active, and it cause some new preemption request for > fairshare 25. > # When preemption from root, it has possibility to find preemption candidate > is queue-2. If so preemptContainerPreCheck for queue-2 return false because > it's equal to its fairshare. > # Finally queue-1-2 will be waiting for resource release form queue-1-1 > itself. > What I expect here is that queue-1-2 preempt from queue-1-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3405) FairScheduler's preemption cannot happen between sibling in some case
[ https://issues.apache.org/jira/browse/YARN-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Zhang updated YARN-3405: - Attachment: YARN-3405.02.patch > FairScheduler's preemption cannot happen between sibling in some case > - > > Key: YARN-3405 > URL: https://issues.apache.org/jira/browse/YARN-3405 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Peng Zhang >Assignee: Peng Zhang >Priority: Critical > Attachments: YARN-3405.01.patch, YARN-3405.02.patch > > > Queue hierarchy described as below: > {noformat} > root >/ \ >queue-1 queue-2 > / \ > queue-1-1 queue-1-2 > {noformat} > Assume cluster resource is 100 > # queue-1-1 and queue-2 has app. Each get 50 usage and 50 fairshare. > # When queue-1-2 is active, and it cause some new preemption request for > fairshare 25. > # When preemption from root, it has possibility to find preemption candidate > is queue-2. If so preemptContainerPreCheck for queue-2 return false because > it's equal to its fairshare. > # Finally queue-1-2 will be waiting for resource release form queue-1-1 > itself. > What I expect here is that queue-1-2 preempt from queue-1-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.
[ https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3431: -- Attachment: YARN-3431.6.patch > Sub resources of timeline entity needs to be passed to a separate endpoint. > --- > > Key: YARN-3431 > URL: https://issues.apache.org/jira/browse/YARN-3431 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch, > YARN-3431.4.patch, YARN-3431.5.patch, YARN-3431.6.patch > > > We have TimelineEntity and some other entities as subclass that inherit from > it. However, we only have a single endpoint, which consume TimelineEntity > rather than sub-classes and this endpoint will check the incoming request > body contains exactly TimelineEntity object. However, the json data which is > serialized from sub-class object seems not to be treated as an TimelineEntity > object, and won't be deserialized into the corresponding sub-class object > which cause deserialization failure as some discussions in YARN-3334 : > https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.
[ https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508529#comment-14508529 ] Zhijie Shen commented on YARN-3431: --- bq. It would be a little more consistent and perform slightly better if the type check in getChildren() is consolidated into validateChildren(). Refactored the code, such that we don't iterate the set twice. bq. maybe we'd like to add some prefix to the fields we (implicitly) add to the info field of an entity? I changed the info keys a bit to make them start with "SYSTEM_INFO_". Hopefully it will reduce the conflict. Anyway, we need to identify the system info keys in the documentation to notify users of not using them. > Sub resources of timeline entity needs to be passed to a separate endpoint. > --- > > Key: YARN-3431 > URL: https://issues.apache.org/jira/browse/YARN-3431 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3431.1.patch, YARN-3431.2.patch, YARN-3431.3.patch, > YARN-3431.4.patch, YARN-3431.5.patch, YARN-3431.6.patch > > > We have TimelineEntity and some other entities as subclass that inherit from > it. However, we only have a single endpoint, which consume TimelineEntity > rather than sub-classes and this endpoint will check the incoming request > body contains exactly TimelineEntity object. However, the json data which is > serialized from sub-class object seems not to be treated as an TimelineEntity > object, and won't be deserialized into the corresponding sub-class object > which cause deserialization failure as some discussions in YARN-3334 : > https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508519#comment-14508519 ] Peng Zhang commented on YARN-3453: -- Update code snippet for calculation of resDueToFairShare {code} Resource target; if (resourceCalculator instanceof DominantResourceCalculator) { float targetRatio = Math.min(1, ((DominantResourceCalculator) resourceCalculator) .getResourceAsValue(sched.getDemand(), resourceUpperBound, false)); target = Resources.multiply(sched.getDemand(), targetRatio); } else { target = resourceUpperBound; } resDueToFairShare = Resources.max(resourceCalculator, clusterResource, Resources.none(), Resources.subtract(target, sched.getResourceUsage())); {code} > Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator > even in DRF mode causing thrashing > > > Key: YARN-3453 > URL: https://issues.apache.org/jira/browse/YARN-3453 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Ashwin Shankar > > There are two places in preemption code flow where DefaultResourceCalculator > is used, even in DRF mode. > Which basically results in more resources getting preempted than needed, and > those extra preempted containers aren’t even getting to the “starved” queue > since scheduling logic is based on DRF's Calculator. > Following are the two places : > 1. {code:title=FSLeafQueue.java|borderStyle=solid} > private boolean isStarved(Resource share) > {code} > A queue shouldn’t be marked as “starved” if the dominant resource usage > is >= fair/minshare. > 2. {code:title=FairScheduler.java|borderStyle=solid} > protected Resource resToPreempt(FSLeafQueue sched, long curTime) > {code} > -- > One more thing that I believe needs to change in DRF mode is : during a > preemption round,if preempting a few containers results in satisfying needs > of a resource type, then we should exit that preemption round, since the > containers that we just preempted should bring the dominant resource usage to > min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Report node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508490#comment-14508490 ] Inigo Goiri commented on YARN-3534: --- [~sunilg], my plan is to send the same entity as in YARN-3482: ResourceUtilization. For now, it will have memory in MB and Virtual Cores as a float. We can extend this for other resources. In terms of the collection of the utilization, this will be monitored by the NodeResourceMonitor (currently empty) which will mimic the behavior of the ContainerMonitor which uses two configuration parameters: (1) if we monitor the resources of the node and (2) how often the resources are monitored. I'm in the last stages of testing it and tomorrow I'll post a patch with this change. The structure was already there so it's a matter of adding the communication part.. > Report node resource utilization > > > Key: YARN-3534 > URL: https://issues.apache.org/jira/browse/YARN-3534 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Affects Versions: 2.7.0 >Reporter: Inigo Goiri >Assignee: Inigo Goiri > Original Estimate: 336h > Remaining Estimate: 336h > > YARN should be aware of the resource utilization of the nodes when scheduling > containers. For this, this task will implement the NodeResourceMonitor and > send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508485#comment-14508485 ] Hadoop QA commented on YARN-3533: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 12s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 5m 26s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 14s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 52m 47s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 74m 35s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727472/YARN-3533.001.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / a100be6 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7467/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7467/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7467//console | This message was automatically generated. > Test: Fix launchAM in MockRM to wait for attempt to be scheduled > > > Key: YARN-3533 > URL: https://issues.apache.org/jira/browse/YARN-3533 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3533.001.patch > > > MockRM#launchAM fails in many test runs because it does not wait for the app > attempt to be scheduled before NM update is sent as noted in [recent > builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508476#comment-14508476 ] sandflee commented on YARN-3533: thanks for you patch, 1, waitForSchedulerAppAttemptAdded may be not done as expected in waitForSchedulerAppAttemptAdded. public T getApplicationAttempt(ApplicationAttemptId applicationAttemptId) { SchedulerApplication app = applications.get(applicationAttemptId.getApplicationId()); return app == null ? null : app.getCurrentAppAttempt(); } as above shows, this func just get the current appAttempt not the appAttempt correspongding to applicationAttemptId. (A BUG?) 2, SCHEDULED is not a stable state, is it possible other nm heartbeat makes it becomes allocated, wait for this state will be blocked? > Test: Fix launchAM in MockRM to wait for attempt to be scheduled > > > Key: YARN-3533 > URL: https://issues.apache.org/jira/browse/YARN-3533 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3533.001.patch > > > MockRM#launchAM fails in many test runs because it does not wait for the app > attempt to be scheduled before NM update is sent as noted in [recent > builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3534) Report node resource utilization
[ https://issues.apache.org/jira/browse/YARN-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508467#comment-14508467 ] Sunil G commented on YARN-3534: --- bq.resource utilization of the nodes Is this going to be configurable item to look for, or all resources will be monitored? > Report node resource utilization > > > Key: YARN-3534 > URL: https://issues.apache.org/jira/browse/YARN-3534 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Affects Versions: 2.7.0 >Reporter: Inigo Goiri >Assignee: Inigo Goiri > Original Estimate: 336h > Remaining Estimate: 336h > > YARN should be aware of the resource utilization of the nodes when scheduling > containers. For this, this task will implement the NodeResourceMonitor and > send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3411: - Attachment: YARN-3411.poc.2.txt Attaching a patch that includes: - a HBaseTimelineWriterImpl class - a test class for the same - an EntityTableDetails class for storing some entity table specific constants and other functions - a TimelineWriterUtils class which has utility functions that are useful while reading from and writing to hbase tables The write function in HBaseTimelineWriterImpl class writes out the entire contents of a TimelineEntity object including it's info, config, metrics (timeseries), isRelatedTo and relatesTo fields. The metrics timeseries is written such that the hbase cell timestamp is set to the metric timestamp, the hbase cell column qualifier is the metric name and the value is the metric value. I also propose changing the TimelineMetric values to be "long" instead of "Object" (although this patch does not make that change). For the metrics column family, we should set a TTL of X days and MIN_VERSIONS = 1. That way, the timeseries info will be retained for X days by hbase and the latest value will always be retained. The test class spins up a MiniCluster via HBaseTestingUtility's startMiniCluster. It creates one entity object with info, config, metrics (timeseries), isRelatedTo and relatesTo entities and writes it to the backend by invoking the write api in HBaseTimelineWriterImpl class. The test scans the entity table and reads back the entity details and verifies the values of each field, including the timeseries. Also attaching an eclipse console log that ran the unit test. The schema creation would be along the lines of this: {code} create 'ats.entity', {NAME => 'i', COMPRESSION => 'LZO', BLOOMFILTER => 'ROWCOL'}, {NAME => 'm', VERSIONS => 2147483647, MIN_VERSIONS => 1, COMPRESSION => 'LZO', BLOCKCACHE => false, TTL => '2592000'}, {NAME => 'c', COMPRESSION => 'LZO', BLOCKCACHE => false, BLOOMFILTER => 'ROWCOL' } {code} > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, > YARN-3411.poc.2.txt, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508414#comment-14508414 ] Rohith commented on YARN-3533: -- +1(non-binding) LGTM .. > Test: Fix launchAM in MockRM to wait for attempt to be scheduled > > > Key: YARN-3533 > URL: https://issues.apache.org/jira/browse/YARN-3533 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3533.001.patch > > > MockRM#launchAM fails in many test runs because it does not wait for the app > attempt to be scheduled before NM update is sent as noted in [recent > builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
[ https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508402#comment-14508402 ] Rohith commented on YARN-3532: -- Is it dup of YARN-1981? > nodemanager version in RM nodes page didn't update when NMs rejoin > -- > > Key: YARN-3532 > URL: https://issues.apache.org/jira/browse/YARN-3532 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-3532.v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508400#comment-14508400 ] Hadoop QA commented on YARN-3437: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727521/YARN-3437.004.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / a100be6 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7466//console | This message was automatically generated. > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch, YARN-3437.002.patch, > YARN-3437.003.patch, YARN-3437.004.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3437: -- Attachment: YARN-3437.004.patch Patch v.4. - moved logging statements out of the synchronized blocks - dropped logging level from ERROR to INFO - reduced the synchronization scope in remove() > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch, YARN-3437.002.patch, > YARN-3437.003.patch, YARN-3437.004.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508378#comment-14508378 ] Sangjin Lee commented on YARN-3437: --- Thanks for the review [~djp]! bq. For performance perspective, we should move LOG.info() out of synchronized block (may be move out of collector.start()?). I can move the LOG.info() call outside the synchronized block. That said, I don't think this would have a meaningful performance impact. Aside from the fact that logging calls are usually synchronized themselves, it is reasonable to expect that the contention for this lock (collectors) would be quite low. We're talking about contention when multiple AMs are competing to create collectors on the same node, and the chances that there is any contention on this lock would be very low. Also, when you said "may be move out of collector.start()", did you mean moving the collector.start() call outside the synchronization block? If so, I'd be hesitant to do that. We just had a discussion on this in another JIRA (see https://issues.apache.org/jira/browse/YARN-3390?focusedCommentId=14508121&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14508121). bq. we don't need to LOG.ERROR (replace with INFO?) That is a good suggestion. I'll update this (and remove()) to lower the logging level for this. bq. For remove(), similar that we should move collector.stop() and LOG.info() out of synchronized block. This we can do safely. I'll update the patch. > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch, YARN-3437.002.patch, > YARN-3437.003.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3534) Report node resource utilization
Inigo Goiri created YARN-3534: - Summary: Report node resource utilization Key: YARN-3534 URL: https://issues.apache.org/jira/browse/YARN-3534 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Inigo Goiri Assignee: Inigo Goiri YARN should be aware of the resource utilization of the nodes when scheduling containers. For this, this task will implement the NodeResourceMonitor and send this information to the Resource Manager in the heartbeat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
[ https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508312#comment-14508312 ] Brahma Reddy Battula commented on YARN-3532: findbugs are handled in HADOOP-11821. > nodemanager version in RM nodes page didn't update when NMs rejoin > -- > > Key: YARN-3532 > URL: https://issues.apache.org/jira/browse/YARN-3532 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-3532.v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508304#comment-14508304 ] Siddharth Wagle commented on YARN-3529: --- Enlisted deps here : {code} org.apache.phoenix phoenix-core ${phoenix.version} org.apache.hadoop hadoop-common org.apache.hadoop hadoop-annotations org.apache.phoenix phoenix-core test-jar ${phoenix.version} test org.apache.hbase hbase-it ${hbase.version} test tests org.apache.hbase hbase-testing-util ${hbase.version} test true org.jruby jruby-complete {code} > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: AbstractMiniHBaseClusterTest.java, > output_minicluster2.txt > > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508279#comment-14508279 ] Hadoop QA commented on YARN-3448: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 23s | The applied patch generated 6 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 8s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 3m 17s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 45m 35s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727487/YARN-3448.12.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a100be6 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7465/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7465/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7465/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7465/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7465//console | This message was automatically generated. > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch, YARN-3448.10.patch, > YARN-3448.12.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, > YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times faster than > random writes. Spend some small effort arranging the writes in such a way > that will trend towards sequential write performance over random write > performance. -- This mess
[jira] [Commented] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
[ https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508250#comment-14508250 ] Hadoop QA commented on YARN-3532: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 59s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 31s | The applied patch generated 4 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 1s | The patch appears to introduce 13 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 52s | Tests passed in hadoop-sls. | | {color:red}-1{color} | yarn tests | 52m 7s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 95m 50s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-sls | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, String):in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, String): new java.io.FileReader(String) At RumenToSLSConverter.java:[line 122] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, String):in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSLoadFile(String, String): new java.io.FileWriter(String) At RumenToSLSConverter.java:[line 124] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSNodeFile(String):in org.apache.hadoop.yarn.sls.RumenToSLSConverter.generateSLSNodeFile(String): new java.io.FileWriter(String) At RumenToSLSConverter.java:[line 145] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.SLSRunner.startAMFromSLSTraces(Resource, int):in org.apache.hadoop.yarn.sls.SLSRunner.startAMFromSLSTraces(Resource, int): new java.io.FileReader(String) At SLSRunner.java:[line 280] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics():in org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper.initMetrics(): new java.io.FileWriter(String) At ResourceSchedulerWrapper.java:[line 490] | | | Found reliance on default encoding in new org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper$MetricsLogRunnable(ResourceSchedulerWrapper):in new org.apache.hadoop.yarn.sls.scheduler.ResourceSchedulerWrapper$MetricsLogRunnable(ResourceSchedulerWrapper): new java.io.FileWriter(String) At ResourceSchedulerWrapper.java:[line 695] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics():in org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(): new java.io.FileWriter(String) At SLSCapacityScheduler.java:[line 493] | | | Found reliance on default encoding in new org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler):in new org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler): new java.io.FileWriter(String) At SLSCapacityScheduler.java:[line 698] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromNodeFile(String):in org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromNodeFile(String): new java.io.FileReader(String) At SLSUtils.java:[line 119] | | | Found reliance on default encoding in org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromSLSTrace(String):in org.apache.hadoop.yarn.sls.utils.SLSUtils.parseNodesFromSLSTrace(String): new java.io.FileReader(String) At SLSUtils.java:[line 92] | | | Class org.apache.hadoop.yarn.sls.web.SLSWebApp defines non-transient non-serializable instance field handleOperTimecostHistogramMap In SLSWebApp.java:instance field handleOperTimecostHistogramMap In SLSWebApp.java | | | Class org.apache.hadoop.yarn.sls.web.SLSWebApp defines non-transient non-serializable instance field queueAllocatedMemoryCounterMap In SLSWebAp
[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3448: -- Attachment: YARN-3448.12.patch Improved the patch further by running the code with a java profile. This patch is 25% faster and generates roughly 20% smaller database than the previous version. - Removed unnecessary PREFIX since each type is in its own database and is not needed to distinguish. - Removed unused invisible related entities to reduce to reduce further operations. - Changed database serialization method to more quickly generate a smaller serialized size of the primary filter values and other info. Library introduced is verified Apache License 2.0 from fast-serialization. - Profile show much time spent converting Strings to byte arrays. Converted the strings once and reused for all the database keys. - Reduced the read cache and write buffer size to take into consideration the 7 day default retention. - Removed insert time from start time database. This feature is used to detect changes since last query, but is not functional since it forces a scan of all data entries. Could be added back at a later time. > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch, YARN-3448.10.patch, > YARN-3448.12.patch, YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, > YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times faster than > random writes. Spend some small effort arranging the writes in such a way > that will trend towards sequential write performance over random write > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508228#comment-14508228 ] Hudson commented on YARN-3366: -- FAILURE: Integrated in Hadoop-trunk-Commit #7642 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7642/]) YARN-3366. Enhanced NodeManager to support classifying/shaping outgoing network bandwidth traffic originating from YARN containers Contributed by Sidharta Seethana. (vinodkv: rev a100be685cc4521e9949589948219231aa5d2733) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestResourceHandlerModule.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestTrafficControlBandwidthHandlerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestTrafficController.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/OutboundBandwidthResourceHandler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TrafficControlBandwidthHandlerImpl.java > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Fix For: 2.8.0 > > Attachments: YARN-3366.001.patch, YARN-3366.002.patch, > YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, > YARN-3366.006.patch, YARN-3366.007.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3448: -- Component/s: timelineserver > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch, YARN-3448.10.patch, > YARN-3448.2.patch, YARN-3448.3.patch, YARN-3448.4.patch, YARN-3448.5.patch, > YARN-3448.7.patch, YARN-3448.8.patch, YARN-3448.9.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times faster than > random writes. Spend some small effort arranging the writes in such a way > that will trend towards sequential write performance over random write > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508220#comment-14508220 ] Sidharta Seethana commented on YARN-3366: - Here is the ticket : https://issues.apache.org/jira/browse/HADOOP-11869 > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3366.001.patch, YARN-3366.002.patch, > YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, > YARN-3366.006.patch, YARN-3366.007.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508204#comment-14508204 ] Vinod Kumar Vavilapalli commented on YARN-3366: --- +1 for the latest patch. Checking this in. Can you file a ticket for the checkstyle rules' issues? > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3366.001.patch, YARN-3366.002.patch, > YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, > YARN-3366.006.patch, YARN-3366.007.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508202#comment-14508202 ] Junping Du commented on YARN-3437: -- Thanks [~sjlee0] for updating the patch! The latest patch looks good in overall, some minor comments: {code} + public TimelineCollector putIfAbsent(ApplicationId appId, + TimelineCollector collector) { +String id = appId.toString(); +TimelineCollector collectorInTable; +boolean collectorIsNew = false; +synchronized (collectors) { + collectorInTable = collectors.get(id); + if (collectorInTable == null) { +try { + // initialize, start, and add it to the collection so it can be + // cleaned up when the parent shuts down + collector.init(getConfig()); + collector.start(); + collectors.put(id, collector); + LOG.info("the collector for " + id + " was added"); + collectorInTable = collector; + collectorIsNew = true; +} catch (Exception e) { + throw new YarnRuntimeException(e); +} + } else { +String msg = "the collector for " + id + " already exists!"; +LOG.error(msg); + } +} + +if (collectorIsNew) { + postPut(appId, collector); +} + +return collectorInTable; + } {code} I understand this code piece is moved from other place. However, I think it need to be improved: - For performance perspective, we should move LOG.info() out of synchronized block (may be move out of collector.start()?). - we don't need to LOG.ERROR (replace with INFO?) if collector exists, general semantic for putIfAbsent should allow put the same object in concurrent threads. For remove(), similar that we should move collector.stop() and LOG.info() out of synchronized block. > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch, YARN-3437.002.patch, > YARN-3437.003.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508192#comment-14508192 ] Sidharta Seethana commented on YARN-3366: - The test failure is unrelated is unrelated to this patch. The checkstyle script and the rules in place need to be revisited - for example, I see warnings for "line too long" for import statements. > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3366.001.patch, YARN-3366.002.patch, > YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, > YARN-3366.006.patch, YARN-3366.007.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508183#comment-14508183 ] Wangda Tan commented on YARN-3413: -- Failed test is not related to the patch. > Node label attributes (like exclusivity) should settable via > addToClusterNodeLabels but shouldn't be changeable at runtime > -- > > Key: YARN-3413 > URL: https://issues.apache.org/jira/browse/YARN-3413 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, > YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch, YARN-3413.7.patch > > > As mentioned in : > https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. > Changing node label exclusivity and/or other attributes may not be a real use > case, and also we should support setting node label attributes whiling adding > them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508175#comment-14508175 ] Hadoop QA commented on YARN-3413: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 19 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 26s | The applied patch generated 12 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 0s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | mapreduce tests | 106m 4s | Tests passed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 0m 28s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 58s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 7m 13s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 2m 0s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 52m 24s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 220m 51s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727320/YARN-3413.7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a3b1d8c | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7456/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/7456/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7456/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7456/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7456/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7456/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7456/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7456/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7456//console | This message was automatically generated. > Node label attributes (like exclusivity) should settable via > addToClusterNodeLabels but shouldn't be changeable at runtime > -- > > Key: YARN-3413 > URL: https://issues.apache.org/jira/browse/YARN-3413 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, > YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch, YARN-3413.7.patch > > > As mentioned in : > https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. > Changing node label exclusivity and/or other attributes may not be a real use > case, and also we should support setting node label attributes whiling adding > them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3387) container complete message couldn't pass to am if am restarted and rm changed
[ https://issues.apache.org/jira/browse/YARN-3387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508167#comment-14508167 ] Anubhav Dhoot commented on YARN-3387: - Thanks [~sandflee] for reporting the issue. I have opened YARN-3533 to fix this. > container complete message couldn't pass to am if am restarted and rm changed > - > > Key: YARN-3387 > URL: https://issues.apache.org/jira/browse/YARN-3387 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: sandflee >Priority: Critical > Labels: patch > Attachments: YARN-3387.001.patch, YARN-3387.002.patch > > > suppose am work preserving and rm ha is enabled. > container complete message is passed to appattemt.justFinishedContainers in > rm。in normal situation,all attempt in one app shares the same > justFinishedContainers, but when rm changed, every attempt has it's own > justFinishedContainers, so in situations below, container complete message > couldn't passed to am: > 1, am restart > 2, rm changes > 3, container launched by first am completes > container complete message will be passed to appAttempt1 not appAttempt2, but > am pull finished containers from appAttempt2 (currentAppAttempt) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
[ https://issues.apache.org/jira/browse/YARN-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3533: Attachment: YARN-3533.001.patch Fix that adds an explicit wait for state and fixes other waits to throw when timedout > Test: Fix launchAM in MockRM to wait for attempt to be scheduled > > > Key: YARN-3533 > URL: https://issues.apache.org/jira/browse/YARN-3533 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0 >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3533.001.patch > > > MockRM#launchAM fails in many test runs because it does not wait for the app > attempt to be scheduled before NM update is sent as noted in [recent > builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508155#comment-14508155 ] Hadoop QA commented on YARN-3366: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 12s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 59s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 30s | The applied patch generated 6 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 27s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 5m 48s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 49m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727369/YARN-3366.007.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0ebe84d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7463/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/7463/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7463/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7463/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7463//console | This message was automatically generated. > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3366.001.patch, YARN-3366.002.patch, > YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, > YARN-3366.006.patch, YARN-3366.007.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508151#comment-14508151 ] Sangjin Lee commented on YARN-3437: --- Could you kindly take a look at the latest patch? Thanks! > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch, YARN-3437.002.patch, > YARN-3437.003.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3533) Test: Fix launchAM in MockRM to wait for attempt to be scheduled
Anubhav Dhoot created YARN-3533: --- Summary: Test: Fix launchAM in MockRM to wait for attempt to be scheduled Key: YARN-3533 URL: https://issues.apache.org/jira/browse/YARN-3533 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.6.0 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot MockRM#launchAM fails in many test runs because it does not wait for the app attempt to be scheduled before NM update is sent as noted in [recent builds|https://issues.apache.org/jira/browse/YARN-3387?focusedCommentId=14507255&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507255] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508134#comment-14508134 ] Hadoop QA commented on YARN-3434: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 57s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 51m 29s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 90m 59s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727350/YARN-3434.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0ebe84d | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7460/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7460/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7460/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7460/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7460//console | This message was automatically generated. > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, > YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508136#comment-14508136 ] Li Lu commented on YARN-3390: - Hi [~sjlee0], thanks for the note! Your rational sounds quite reasonable so let keep the current design here. For now I'm OK with the coarse-grained synchronization. > Reuse TimelineCollectorManager for RM > - > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3390.1.patch > > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
[ https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-3532: -- Attachment: YARN-3532.v1.patch > nodemanager version in RM nodes page didn't update when NMs rejoin > -- > > Key: YARN-3532 > URL: https://issues.apache.org/jira/browse/YARN-3532 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: YARN-3532.v1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
[ https://issues.apache.org/jira/browse/YARN-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li reassigned YARN-3532: - Assignee: Siqi Li > nodemanager version in RM nodes page didn't update when NMs rejoin > -- > > Key: YARN-3532 > URL: https://issues.apache.org/jira/browse/YARN-3532 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3532) nodemanager version in RM nodes page didn't update when NMs rejoin
Siqi Li created YARN-3532: - Summary: nodemanager version in RM nodes page didn't update when NMs rejoin Key: YARN-3532 URL: https://issues.apache.org/jira/browse/YARN-3532 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508121#comment-14508121 ] Sangjin Lee commented on YARN-3390: --- bq. In TimelineCollectorManager, I'm still having this question, although we may not want to address it in this JIRA: are there any special consistency requirements that prevent us from using ConcurrentHashMap? I can answer this as I added that code. :) In putIfAbsent(), it needs to start the collector as well if get() returns null. If we used ConcurrentHashMap and removed synchronization, multiple threads could start their own collectors unnecessarily. It is probably not a show stopper but less than desirable. Also, in real life the contention on TimelineCollectorManager is low enough that synchronization should be perfectly adequate. If we want to do this without synchronization, then we would want to use something like guava's LoadingCache. > Reuse TimelineCollectorManager for RM > - > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3390.1.patch > > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.
[ https://issues.apache.org/jira/browse/YARN-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508107#comment-14508107 ] Hadoop QA commented on YARN-3530: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 53s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 98 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 4m 5s | The applied patch generated 3 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 45s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 2m 46s | Tests passed in hadoop-yarn-server-applicationhistoryservice. | | | | 42m 46s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727360/YARN-3530.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0ebe84d | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7461/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7461/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/7461/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7461/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7461//console | This message was automatically generated. > ATS throws exception on trying to filter results without otherinfo. > --- > > Key: YARN-3530 > URL: https://issues.apache.org/jira/browse/YARN-3530 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Sreenath Somarajapuram >Assignee: Zhijie Shen >Priority: Critical > Attachments: YARN-3530.1.patch > > > Scenario: > Am attempting to make data loading faster by fetching otherinfo on demand. As > shown in the attached image, the patch adds a 'Load Counters' checkbox. It > would be disabled by default, and on clicking, the counter data also would be > loaded. > Issue: > Things are good when otherinfo is loaded. > But ATS throws exception on trying to filter on status or applicationId > without otherinfo in fields list. > In other words, using fields=events,primaryfilters with > secondaryFilter=status:RUNNING will return > { "exception": "WebApplicationException", "message": > "java.lang.NullPointerException", "javaClassName": > "javax.ws.rs.WebApplicationException" } > from the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508097#comment-14508097 ] Li Lu commented on YARN-3390: - Hi [~zjshen], thanks for the patch! Here are some of my comments. Most of them are quite minor: # Changes in RMContainerAllocator.java appears to be irrelevant. Seems like this is changed by an IDE by mistake (on a refactoring)? # In the following lines {code} +for (String tag : app.getApplicationTags()) { + String value = null; + if ((value = getFlowContext(TimelineUtils.FLOW_NAME_TAG_PREFIX, tag)) != null the first null assignment to value is marked as redundant + if ((value = getFlowContext(TimelineUtils.FLOW_NAME_TAG_PREFIX, tag)) != null + && !value.isEmpty()) { + collector.getTimelineEntityContext().setFlowName(value); + } else if ((value = getFlowContext(TimelineUtils.FLOW_VERSION_TAG_PREFIX, tag)) != null + && !value.isEmpty()) { +collector.getTimelineEntityContext().setFlowVersion(value); + } else if ((value = getFlowContext(TimelineUtils.FLOW_RUN_ID_TAG_PREFIX, tag)) != null + && !value.isEmpty()) { +collector.getTimelineEntityContext().setFlowRunId(Long.valueOf(value)); + } {code} Maybe we’d like to use a switch statement to deal with this? We may first split the tag into two parts, based on the first “:”, and then switch the first part of the returned array to set the second part of the array into flow name, version, and run id. Am I missing any fundamental obstacles for us to do this here? (String switch is available from Java 7) # Rename {{MyNMTimelineCollectorManager}} in TestTimelineServiceClientIntegration with something indicating it's for testing? # In the following lines: {code} - protected TimelineCollectorContext getTimelineEntityContext() { + public TimelineCollectorContext getTimelineEntityContext() { {code} We're exposing TimelineCollectorContext but we're not annotating the class. Even though we may treat unannotated classes as Audience.Private, maybe we'd like to mark it as unstable? # In TimelineCollectorManager, I'm still having this question, although we may not want to address it in this JIRA: are there any special consistency requirements that prevent us from using ConcurrentHashMap? # In TimelineCollectorWebService, why we're removing the utility function {{getCollector}}? I think we can reuse it when adding new web services. > Reuse TimelineCollectorManager for RM > - > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3390.1.patch > > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3366: Attachment: YARN-3366.007.patch Attaching a new patch based on code-review feedback from [~vinodkv] > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3366.001.patch, YARN-3366.002.patch, > YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, > YARN-3366.006.patch, YARN-3366.007.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.75.patch trying something wrt checkstyle > Implement a FairOrderingPolicy > -- > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3319.13.patch, YARN-3319.14.patch, > YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, > YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, > YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, > YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch, YARN-3319.75.patch > > > Implement a FairOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. > The Policy will offer allocations to applications in a queue in order of > least resources used, and preempt applications in reverse order (from most > resources used). This will include conditional support for sizeBasedWeight > style adjustment > Optionally, based on a conditional configuration to enable sizeBasedWeight > (default false), an adjustment to boost larger applications (to offset the > natural preference for smaller applications) will adjust the resource usage > value based on demand, dividing it by the below value: > Math.log1p(app memory demand) / Math.log(2); > In cases where the above is indeterminate (two applications are equal after > this comparison), behavior falls back to comparison based on the application > id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508025#comment-14508025 ] Li Lu commented on YARN-3134: - Hi [~vrushalic], sure, will add isRelatedTo and relatesTo since YARN-3431 is close to finish. For the metrics, my thought is we may need to have some time-based aggregations, like taking the average (or max) of a few time series data and store them in an aggregated table. The "precision" table for now serves as the raw data table. The user can query on the aggregation table(s) for data points per-hour, per-day or so. Time stamps information is split into two parts: the time epoch information, marked by the startTime and endTime of the metric object, and the actual time for a point in a time series. Epoch start and end times are used as PKs for the Phoenix storage for better indexing, and detailed time for each point is stored in the time series. We can certainly discuss about this design, though... > [Storage implementation] Exploiting the option of using Phoenix to access > HBase backend > --- > > Key: YARN-3134 > URL: https://issues.apache.org/jira/browse/YARN-3134 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Li Lu > Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, > YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf > > > Quote the introduction on Phoenix web page: > {code} > Apache Phoenix is a relational database layer over HBase delivered as a > client-embedded JDBC driver targeting low latency queries over HBase data. > Apache Phoenix takes your SQL query, compiles it into a series of HBase > scans, and orchestrates the running of those scans to produce regular JDBC > result sets. The table metadata is stored in an HBase table and versioned, > such that snapshot queries over prior versions will automatically use the > correct schema. Direct use of the HBase API, along with coprocessors and > custom filters, results in performance on the order of milliseconds for small > queries, or seconds for tens of millions of rows. > {code} > It may simply our implementation read/write data from/to HBase, and can > easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.
[ https://issues.apache.org/jira/browse/YARN-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3530: -- Attachment: YARN-3530.1.patch > ATS throws exception on trying to filter results without otherinfo. > --- > > Key: YARN-3530 > URL: https://issues.apache.org/jira/browse/YARN-3530 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Sreenath Somarajapuram >Assignee: Zhijie Shen >Priority: Critical > Attachments: YARN-3530.1.patch > > > Scenario: > Am attempting to make data loading faster by fetching otherinfo on demand. As > shown in the attached image, the patch adds a 'Load Counters' checkbox. It > would be disabled by default, and on clicking, the counter data also would be > loaded. > Issue: > Things are good when otherinfo is loaded. > But ATS throws exception on trying to filter on status or applicationId > without otherinfo in fields list. > In other words, using fields=events,primaryfilters with > secondaryFilter=status:RUNNING will return > { "exception": "WebApplicationException", "message": > "java.lang.NullPointerException", "javaClassName": > "javax.ws.rs.WebApplicationException" } > from the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.
[ https://issues.apache.org/jira/browse/YARN-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508023#comment-14508023 ] Zhijie Shen commented on YARN-3530: --- The problem is that the fields are set not to include other info, such that that the entity has null other info map. The secondary filter is matched after the entity is loaded. It tries to compare with null other info map, and results in NPE. The fix is to always include primary filters and other info into the fields if secondary filters are non-empty for matching, and remove them from the entity before returning it back to the caller. > ATS throws exception on trying to filter results without otherinfo. > --- > > Key: YARN-3530 > URL: https://issues.apache.org/jira/browse/YARN-3530 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Sreenath Somarajapuram >Assignee: Zhijie Shen >Priority: Critical > Attachments: YARN-3530.1.patch > > > Scenario: > Am attempting to make data loading faster by fetching otherinfo on demand. As > shown in the attached image, the patch adds a 'Load Counters' checkbox. It > would be disabled by default, and on clicking, the counter data also would be > loaded. > Issue: > Things are good when otherinfo is loaded. > But ATS throws exception on trying to filter on status or applicationId > without otherinfo in fields list. > In other words, using fields=events,primaryfilters with > secondaryFilter=status:RUNNING will return > { "exception": "WebApplicationException", "message": > "java.lang.NullPointerException", "javaClassName": > "javax.ws.rs.WebApplicationException" } > from the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508005#comment-14508005 ] Vrushali C commented on YARN-3134: -- Hi [~gtCarrera9] Thanks! bq. in the current version I've not implemented isRelatedTo and relatesTo. I can certainly add this section if it's required for the performance benchmark. Yes, I think for the PoC we should write everything that the TimelineEntity class has to the backend store. bq. My current plan is to use the metrics precision table for aggregations, and just use the aggregated data for Phoenix SQL queries. Okay, I see, (for my understanding) how would the query for say, a map task level metrics be? There won't be any aggregation at that level, no? Also I am wondering how this metrics timeseries information would be queried. Could you please explain how the timestamps are stored? > [Storage implementation] Exploiting the option of using Phoenix to access > HBase backend > --- > > Key: YARN-3134 > URL: https://issues.apache.org/jira/browse/YARN-3134 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Li Lu > Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, > YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf > > > Quote the introduction on Phoenix web page: > {code} > Apache Phoenix is a relational database layer over HBase delivered as a > client-embedded JDBC driver targeting low latency queries over HBase data. > Apache Phoenix takes your SQL query, compiles it into a series of HBase > scans, and orchestrates the running of those scans to produce regular JDBC > result sets. The table metadata is stored in an HBase table and versioned, > such that snapshot queries over prior versions will automatically use the > correct schema. Direct use of the HBase API, along with coprocessors and > custom filters, results in performance on the order of milliseconds for small > queries, or seconds for tens of millions of rows. > {code} > It may simply our implementation read/write data from/to HBase, and can > easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507986#comment-14507986 ] Li Lu commented on YARN-3134: - Hi [~vrushalic], in the current version I've not implemented isRelatedTo and relatesTo. I can certainly add this section if it's required for the performance benchmark. My current plan is to use the metrics precision table for aggregations, and just use the aggregated data for Phoenix SQL queries. I'm open in mind about both points, though. > [Storage implementation] Exploiting the option of using Phoenix to access > HBase backend > --- > > Key: YARN-3134 > URL: https://issues.apache.org/jira/browse/YARN-3134 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Li Lu > Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, > YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf > > > Quote the introduction on Phoenix web page: > {code} > Apache Phoenix is a relational database layer over HBase delivered as a > client-embedded JDBC driver targeting low latency queries over HBase data. > Apache Phoenix takes your SQL query, compiles it into a series of HBase > scans, and orchestrates the running of those scans to produce regular JDBC > result sets. The table metadata is stored in an HBase table and versioned, > such that snapshot queries over prior versions will automatically use the > correct schema. Direct use of the HBase API, along with coprocessors and > custom filters, results in performance on the order of milliseconds for small > queries, or seconds for tens of millions of rows. > {code} > It may simply our implementation read/write data from/to HBase, and can > easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-3434: Attachment: YARN-3434.patch Attaching exact same patch to kick jenkins again > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, > YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507952#comment-14507952 ] Hadoop QA commented on YARN-3434: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 36s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 52m 23s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 93m 31s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727317/YARN-3434.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a3b1d8c | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7455/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7455/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7455/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7455/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7455//console | This message was automatically generated. > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, > YARN-3434.patch, YARN-3434.patch, YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3362) Add node label usage in RM CapacityScheduler web UI
[ https://issues.apache.org/jira/browse/YARN-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507949#comment-14507949 ] Wangda Tan commented on YARN-3362: -- Since this patch tends to create a new RM web view, so I suggest to do it in a compatible way so that people will not get lost when using the new view: - When label is not enabled in the cluster or there's only "default" label, it will not show the "label-hierarchy", so it will be very similar to what we have in old RM web UI - All queue's capacities and usages (by partition) can be found in CSQueue.getQueueCapacities/getQueueResourceUsages. - We may not need to consider show app-by-partition with this patch. Filter apps in a queue by partition needs add additional CS interface, which should be addressed in a separated patch. > Add node label usage in RM CapacityScheduler web UI > --- > > Key: YARN-3362 > URL: https://issues.apache.org/jira/browse/YARN-3362 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager, webapp >Reporter: Wangda Tan >Assignee: Naganarasimha G R > > We don't have node label usage in RM CapacityScheduler web UI now, without > this, user will be hard to understand what happened to nodes have labels > assign to it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507948#comment-14507948 ] Hadoop QA commented on YARN-3437: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727346/YARN-3437.003.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 0ebe84d | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7459//console | This message was automatically generated. > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch, YARN-3437.002.patch, > YARN-3437.003.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3437: -- Attachment: YARN-3437.003.patch Patch v.3. It's just a rename of the test class (with a couple of constants renamed accordingly) compared to patch v.2. > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch, YARN-3437.002.patch, > YARN-3437.003.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507920#comment-14507920 ] Wangda Tan commented on YARN-3319: -- Retriggerred Jenkins to see if problem still exists. > Implement a FairOrderingPolicy > -- > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3319.13.patch, YARN-3319.14.patch, > YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, > YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, > YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, > YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch > > > Implement a FairOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. > The Policy will offer allocations to applications in a queue in order of > least resources used, and preempt applications in reverse order (from most > resources used). This will include conditional support for sizeBasedWeight > style adjustment > Optionally, based on a conditional configuration to enable sizeBasedWeight > (default false), an adjustment to boost larger applications (to offset the > natural preference for smaller applications) will adjust the resource usage > value based on demand, dividing it by the below value: > Math.log1p(app memory demand) / Math.log(2); > In cases where the above is indeterminate (two applications are equal after > this comparison), behavior falls back to comparison based on the application > id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507917#comment-14507917 ] Craig Welch commented on YARN-3319: --- The failed tests pass on my box with the patch, unrelated. The checkstyle is referring to ResourceLimits, which the patch doesn't change... poking around in the build artifacts there are some exceptions in some of the checkstyle stuff, I'm not sure it's actually working correctly > Implement a FairOrderingPolicy > -- > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3319.13.patch, YARN-3319.14.patch, > YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, > YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, > YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, > YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch > > > Implement a FairOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. > The Policy will offer allocations to applications in a queue in order of > least resources used, and preempt applications in reverse order (from most > resources used). This will include conditional support for sizeBasedWeight > style adjustment > Optionally, based on a conditional configuration to enable sizeBasedWeight > (default false), an adjustment to boost larger applications (to offset the > natural preference for smaller applications) will adjust the resource usage > value based on demand, dividing it by the below value: > Math.log1p(app memory demand) / Math.log(2); > In cases where the above is indeterminate (two applications are equal after > this comparison), behavior falls back to comparison based on the application > id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507914#comment-14507914 ] Vrushali C commented on YARN-3134: -- Hi [~gtCarrera9] Thanks for the patch, I had some questions: - I don't see the isRelatedTo and relatesTo entities being written in this patch - For the metrics timeseries, I see that the metric values are being written as a ";" separated list of values as a string, is that right? But I could not figure how where the timestamps associated with each metric value are stored. Storing metric values as strings would make it harder I think to query in numerical queries, like how many entities had GC MILLIS that were more than 25% of the CPU MILLIS. > [Storage implementation] Exploiting the option of using Phoenix to access > HBase backend > --- > > Key: YARN-3134 > URL: https://issues.apache.org/jira/browse/YARN-3134 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Li Lu > Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, > YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf > > > Quote the introduction on Phoenix web page: > {code} > Apache Phoenix is a relational database layer over HBase delivered as a > client-embedded JDBC driver targeting low latency queries over HBase data. > Apache Phoenix takes your SQL query, compiles it into a series of HBase > scans, and orchestrates the running of those scans to produce regular JDBC > result sets. The table metadata is stored in an HBase table and versioned, > such that snapshot queries over prior versions will automatically use the > correct schema. Direct use of the HBase API, along with coprocessors and > custom filters, results in performance on the order of milliseconds for small > queries, or seconds for tens of millions of rows. > {code} > It may simply our implementation read/write data from/to HBase, and can > easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507901#comment-14507901 ] Hadoop QA commented on YARN-3522: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 26s | The applied patch generated 2 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 0s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 6m 58s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 50m 28s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727324/YARN-3522.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a3b1d8c | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7457/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7457/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7457/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7457/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7457//console | This message was automatically generated. > DistributedShell uses the wrong user to put timeline data > - > > Key: YARN-3522 > URL: https://issues.apache.org/jira/browse/YARN-3522 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-3522.1.patch, YARN-3522.2.patch, YARN-3522.3.patch > > > YARN-3287 breaks the timeline access control of distributed shell. In > distributed shell AM: > {code} > if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, > YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { > // Creating the Timeline Client > timelineClient = TimelineClient.createTimelineClient(); > timelineClient.init(conf); > timelineClient.start(); > } else { > timelineClient = null; > LOG.warn("Timeline service is not enabled"); > } > {code} > {code} > ugi.doAs(new PrivilegedExceptionAction() { > @Override > public TimelinePutResponse run() throws Exception { > return timelineClient.putEntities(entity); > } > }); > {code} > YARN-3287 changes the timeline client to get the right ugi at serviceInit, > but DS AM still doesn't use submitter ugi to init timeline client, but use > the ugi for each put entity call. It result in the wrong user of the put > request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507889#comment-14507889 ] Wangda Tan commented on YARN-3434: -- Thanks for update [~tgraves]. Latest patch LGTM, +1. > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, > YARN-3434.patch, YARN-3434.patch, YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3520) Invalid user logins trigger large backtraces in server log
[ https://issues.apache.org/jira/browse/YARN-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3520: - Summary: Invalid user logins trigger large backtraces in server log (was: get rid of excessive stacktrace caused by expired cookie in timeline log) > Invalid user logins trigger large backtraces in server log > -- > > Key: YARN-3520 > URL: https://issues.apache.org/jira/browse/YARN-3520 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-3520.patch > > > {code} > WARN sso.CookieValidatorHelpers: Cookie has expired by 25364187 msec > WARN server.AuthenticationFilter: Authentication exception: Invalid Cookie > 166 org.apache.hadoop.security.authentication.client.AuthenticationException: > Invalid Bouncer Cookie > 167 at > KerberosAuthenticationHandler.bouncerAuthenticate(KerberosAuthenticationHandler.java:94) > 168 at > AuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:82) > 169 at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:507) > 170 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 171 at > org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(CrossOriginFilter.java:95) > 172 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 173 at > org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) > 174 at GzipFilter.doFilter(GzipFilter.java:188) > 175 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 176 at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224) > 177 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 178 at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > 179 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 180 at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > 181 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 182 at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > 183 at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > 184 at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > 185 at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > 186 at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > 187 at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > 188 at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > 189 at org.mortbay.jetty.Server.handle(Server.java:326) > 190 at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > 191 at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > 192 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > 193 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > 194 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > 195 at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > 196 at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > WARN sso.CookieValidatorHelpers: Cookie has expired by 25373197 msec > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507873#comment-14507873 ] Hadoop QA commented on YARN-3319: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 59s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 99 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 25s | The applied patch generated 3 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 17s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 50m 50s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 21s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestRM | | | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727310/YARN-3319.74.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 12f4df0 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/7454/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7454/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7454/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7454/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7454//console | This message was automatically generated. > Implement a FairOrderingPolicy > -- > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3319.13.patch, YARN-3319.14.patch, > YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, > YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, > YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, > YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch > > > Implement a FairOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. > The Policy will offer allocations to applications in a queue in order of > least resources used, and preempt applications in reverse order (from most > resources used). This will include conditional support for sizeBasedWeight > style adjustment > Optionally, based on a conditional configuration to enable sizeBasedWeight > (default false), an adjustment to boost larger applications (to offset the > natural preference for smaller applications) will adjust the resource usage > value based on demand, dividing it by the below value: > Math.log1p(app memory demand) / Math.log(2); > In cases where the above is indeterminate (two applications are equal after > this comparison), behavior falls back to comparison based on the application > id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly
[ https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507858#comment-14507858 ] Hadoop QA commented on YARN-2605: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 7m 40s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 59s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | yarn tests | 6m 57s | Tests passed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 52m 33s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 103m 32s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727301/YARN-2605.2.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / 12f4df0 | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/7452/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7452/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7452/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7452//console | This message was automatically generated. > [RM HA] Rest api endpoints doing redirect incorrectly > - > > Key: YARN-2605 > URL: https://issues.apache.org/jira/browse/YARN-2605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: bc Wong >Assignee: Xuan Gong > Labels: newbie > Attachments: YARN-2605.1.patch, YARN-2605.2.patch > > > The standby RM's webui tries to do a redirect via meta-refresh. That is fine > for pages designed to be viewed by web browsers. But the API endpoints > shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd > suggest HTTP 303, or return a well-defined error message (json or xml) > stating that the standby status and a link to the active RM. > The standby RM is returning this today: > {noformat} > $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Thu, 25 Sep 2014 18:34:53 GMT > Date: Thu, 25 Sep 2014 18:34:53 GMT > Pragma: no-cache > Expires: Thu, 25 Sep 2014 18:34:53 GMT > Date: Thu, 25 Sep 2014 18:34:53 GMT > Pragma: no-cache > Content-Type: text/plain; charset=UTF-8 > Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics > Content-Length: 117 > Server: Jetty(6.1.26) > This is standby RM. Redirecting to the current active RM: > http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507848#comment-14507848 ] Wangda Tan commented on YARN-3319: -- Thanks for update, [~cwelch]. Patch LGTM, +1. > Implement a FairOrderingPolicy > -- > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3319.13.patch, YARN-3319.14.patch, > YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, > YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, > YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, > YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch > > > Implement a FairOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. > The Policy will offer allocations to applications in a queue in order of > least resources used, and preempt applications in reverse order (from most > resources used). This will include conditional support for sizeBasedWeight > style adjustment > Optionally, based on a conditional configuration to enable sizeBasedWeight > (default false), an adjustment to boost larger applications (to offset the > natural preference for smaller applications) will adjust the resource usage > value based on demand, dividing it by the below value: > Math.log1p(app memory demand) / Math.log(2); > In cases where the above is indeterminate (two applications are equal after > this comparison), behavior falls back to comparison based on the application > id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507821#comment-14507821 ] Li Lu commented on YARN-3529: - Thanks [~vrushalic] and [~swagle]! One quick question for Sid here is, does the in memory HBase cluster requiring any pom changes, or it's automatically integrated inside the Phoenix jar? If the latter holds, the Phoenix work is not depending on the MiniHBase cluster? Thanks! > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: AbstractMiniHBaseClusterTest.java, > output_minicluster2.txt > > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3520) get rid of excessive stacktrace caused by expired cookie in timeline log
[ https://issues.apache.org/jira/browse/YARN-3520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507822#comment-14507822 ] Jason Lowe commented on YARN-3520: -- +1, committing this. > get rid of excessive stacktrace caused by expired cookie in timeline log > > > Key: YARN-3520 > URL: https://issues.apache.org/jira/browse/YARN-3520 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: YARN-3520.patch > > > {code} > WARN sso.CookieValidatorHelpers: Cookie has expired by 25364187 msec > WARN server.AuthenticationFilter: Authentication exception: Invalid Cookie > 166 org.apache.hadoop.security.authentication.client.AuthenticationException: > Invalid Bouncer Cookie > 167 at > KerberosAuthenticationHandler.bouncerAuthenticate(KerberosAuthenticationHandler.java:94) > 168 at > AuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:82) > 169 at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:507) > 170 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 171 at > org.apache.hadoop.yarn.server.timeline.webapp.CrossOriginFilter.doFilter(CrossOriginFilter.java:95) > 172 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 173 at > org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) > 174 at GzipFilter.doFilter(GzipFilter.java:188) > 175 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 176 at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224) > 177 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 178 at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > 179 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 180 at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > 181 at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > 182 at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > 183 at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > 184 at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > 185 at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > 186 at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > 187 at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > 188 at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > 189 at org.mortbay.jetty.Server.handle(Server.java:326) > 190 at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > 191 at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > 192 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > 193 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > 194 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > 195 at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > 196 at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > WARN sso.CookieValidatorHelpers: Cookie has expired by 25373197 msec > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507810#comment-14507810 ] Zhijie Shen commented on YARN-3522: --- bq. I suggest adding a code comment in TimeLineClient#createTimelineClient Add a comment. bq. tart and end event occurred in the same run() method ? It's another bug. I fix it together here. The test failure is fixed in the new patch. It seems that the checkstyle warning is not related. > DistributedShell uses the wrong user to put timeline data > - > > Key: YARN-3522 > URL: https://issues.apache.org/jira/browse/YARN-3522 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-3522.1.patch, YARN-3522.2.patch > > > YARN-3287 breaks the timeline access control of distributed shell. In > distributed shell AM: > {code} > if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, > YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { > // Creating the Timeline Client > timelineClient = TimelineClient.createTimelineClient(); > timelineClient.init(conf); > timelineClient.start(); > } else { > timelineClient = null; > LOG.warn("Timeline service is not enabled"); > } > {code} > {code} > ugi.doAs(new PrivilegedExceptionAction() { > @Override > public TimelinePutResponse run() throws Exception { > return timelineClient.putEntities(entity); > } > }); > {code} > YARN-3287 changes the timeline client to get the right ugi at serviceInit, > but DS AM still doesn't use submitter ugi to init timeline client, but use > the ugi for each put entity call. It result in the wrong user of the put > request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3522: -- Attachment: YARN-3522.3.patch > DistributedShell uses the wrong user to put timeline data > - > > Key: YARN-3522 > URL: https://issues.apache.org/jira/browse/YARN-3522 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-3522.1.patch, YARN-3522.2.patch, YARN-3522.3.patch > > > YARN-3287 breaks the timeline access control of distributed shell. In > distributed shell AM: > {code} > if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, > YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { > // Creating the Timeline Client > timelineClient = TimelineClient.createTimelineClient(); > timelineClient.init(conf); > timelineClient.start(); > } else { > timelineClient = null; > LOG.warn("Timeline service is not enabled"); > } > {code} > {code} > ugi.doAs(new PrivilegedExceptionAction() { > @Override > public TimelinePutResponse run() throws Exception { > return timelineClient.putEntities(entity); > } > }); > {code} > YARN-3287 changes the timeline client to get the right ugi at serviceInit, > but DS AM still doesn't use submitter ugi to init timeline client, but use > the ugi for each put entity call. It result in the wrong user of the put > request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507801#comment-14507801 ] Siddharth Wagle commented on YARN-3529: --- Note: This will start in memory HBase cluster and provide a Phoenix connection to use for tests. > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: AbstractMiniHBaseClusterTest.java, > output_minicluster2.txt > > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Wagle updated YARN-3529: -- Attachment: AbstractMiniHBaseClusterTest.java Attaching a Abstract base class from Ambari Metrics System, this directly extends the Phoenix test classes from the Phoenix compile time dep. All subclasses of this class use the Accessor functions to get to the Phoenix tables. > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: AbstractMiniHBaseClusterTest.java, > output_minicluster2.txt > > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3413: - Attachment: YARN-3413.7.patch Updated to latest trunk after YARN-3225 (ver.7) > Node label attributes (like exclusivity) should settable via > addToClusterNodeLabels but shouldn't be changeable at runtime > -- > > Key: YARN-3413 > URL: https://issues.apache.org/jira/browse/YARN-3413 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, > YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch, YARN-3413.7.patch > > > As mentioned in : > https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. > Changing node label exclusivity and/or other attributes may not be a real use > case, and also we should support setting node label attributes whiling adding > them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-3434: Attachment: YARN-3434.patch Fixed the line length and the white space style issues. Other then that I moved things around and its just complaining about the same things more. > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch, YARN-3434.patch, YARN-3434.patch, > YARN-3434.patch, YARN-3434.patch, YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2654) Revisit all shared cache config parameters to ensure quality names
[ https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reopened YARN-2654: --- > Revisit all shared cache config parameters to ensure quality names > -- > > Key: YARN-2654 > URL: https://issues.apache.org/jira/browse/YARN-2654 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Blocker > Attachments: shared_cache_config_parameters.txt > > > Revisit all the shared cache config parameters in YarnConfiguration and > yarn-default.xml to ensure quality names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: output_minicluster2.txt > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: output_minicluster2.txt > > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2654) Revisit all shared cache config parameters to ensure quality names
[ https://issues.apache.org/jira/browse/YARN-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-2654. --- Resolution: Won't Fix Closing as 'Won't Fix' > Revisit all shared cache config parameters to ensure quality names > -- > > Key: YARN-2654 > URL: https://issues.apache.org/jira/browse/YARN-2654 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo >Priority: Blocker > Attachments: shared_cache_config_parameters.txt > > > Revisit all the shared cache config parameters in YarnConfiguration and > yarn-default.xml to ensure quality names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3319: -- Attachment: YARN-3319.74.patch > Implement a FairOrderingPolicy > -- > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3319.13.patch, YARN-3319.14.patch, > YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, > YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, > YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, > YARN-3319.72.patch, YARN-3319.73.patch, YARN-3319.74.patch > > > Implement a FairOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. > The Policy will offer allocations to applications in a queue in order of > least resources used, and preempt applications in reverse order (from most > resources used). This will include conditional support for sizeBasedWeight > style adjustment > Optionally, based on a conditional configuration to enable sizeBasedWeight > (default false), an adjustment to boost larger applications (to offset the > natural preference for smaller applications) will adjust the resource usage > value based on demand, dividing it by the below value: > Math.log1p(app memory demand) / Math.log(2); > In cases where the above is indeterminate (two applications are equal after > this comparison), behavior falls back to comparison based on the application > id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2168) SCM/Client/NM/Admin protocols
[ https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reopened YARN-2168: --- > SCM/Client/NM/Admin protocols > - > > Key: YARN-2168 > URL: https://issues.apache.org/jira/browse/YARN-2168 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.7.0 > > Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch > > > This jira is meant to be used to review the main shared cache APIs. They are > as follows: > * ClientSCMProtocol - The protocol between the yarn client and the cache > manager. This protocol controls how resources in the cache are claimed and > released. > ** UseSharedCacheResourceRequest > ** UseSharedCacheResourceResponse > ** ReleaseSharedCacheResourceRequest > ** ReleaseSharedCacheResourceResponse > * SCMAdminProtocol - This is an administrative protocol for the cache > manager. It allows administrators to manually trigger cleaner runs. > ** RunSharedCacheCleanerTaskRequest > ** RunSharedCacheCleanerTaskResponse > * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the > cache manager. This allows the NodeManager to coordinate with the cache > manager when uploading new resources to the shared cache. > ** NotifySCMRequest > ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (YARN-2168) SCM/Client/NM/Admin protocols
[ https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli closed YARN-2168. - > SCM/Client/NM/Admin protocols > - > > Key: YARN-2168 > URL: https://issues.apache.org/jira/browse/YARN-2168 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.7.0 > > Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch > > > This jira is meant to be used to review the main shared cache APIs. They are > as follows: > * ClientSCMProtocol - The protocol between the yarn client and the cache > manager. This protocol controls how resources in the cache are claimed and > released. > ** UseSharedCacheResourceRequest > ** UseSharedCacheResourceResponse > ** ReleaseSharedCacheResourceRequest > ** ReleaseSharedCacheResourceResponse > * SCMAdminProtocol - This is an administrative protocol for the cache > manager. It allows administrators to manually trigger cleaner runs. > ** RunSharedCacheCleanerTaskRequest > ** RunSharedCacheCleanerTaskResponse > * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the > cache manager. This allows the NodeManager to coordinate with the cache > manager when uploading new resources to the shared cache. > ** NotifySCMRequest > ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: (was: output_minicluster2.txt) > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2168) SCM/Client/NM/Admin protocols
[ https://issues.apache.org/jira/browse/YARN-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-2168. --- Resolution: Duplicate Fix Version/s: 2.7.0 Resolving this instead as a duplicate. > SCM/Client/NM/Admin protocols > - > > Key: YARN-2168 > URL: https://issues.apache.org/jira/browse/YARN-2168 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Fix For: 2.7.0 > > Attachments: YARN-2168-trunk-v1.patch, YARN-2168-trunk-v2.patch > > > This jira is meant to be used to review the main shared cache APIs. They are > as follows: > * ClientSCMProtocol - The protocol between the yarn client and the cache > manager. This protocol controls how resources in the cache are claimed and > released. > ** UseSharedCacheResourceRequest > ** UseSharedCacheResourceResponse > ** ReleaseSharedCacheResourceRequest > ** ReleaseSharedCacheResourceResponse > * SCMAdminProtocol - This is an administrative protocol for the cache > manager. It allows administrators to manually trigger cleaner runs. > ** RunSharedCacheCleanerTaskRequest > ** RunSharedCacheCleanerTaskResponse > * NMCacheUploaderSCMProtocol - The protocol between the NodeManager and the > cache manager. This allows the NodeManager to coordinate with the cache > manager when uploading new resources to the shared cache. > ** NotifySCMRequest > ** NotifySCMResponse -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: (was: output_minicluster.txt) > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: output_minicluster2.txt > > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: output_minicluster2.txt > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: output_minicluster2.txt > > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507701#comment-14507701 ] Craig Welch commented on YARN-3319: --- bq. Some minor comments about configuration part by index: 1) done 2) done 3) done - see below bq. Do you think is it better to make property in .ordering-policy..?... Now that there is not proper composition only one policy can be active at a time and it shouldn't be necessary to namespace config items this way. At the same time, I could see us getting back to proper composition at some point, where this would be helpful. I've implemented it as a prefix convention in the policy instead of constraining the contents of the map in the capacity scheduler configuration. This is because we still support passing a class name as the policy type, which would make the configurations for "class name based" items unwieldy. It would also allow us to have shared configuration items between policies if we do end up with proper composition again. The end result of the configuration was as you suggested 4) done 5) done bq. FairOrderingPolicy: all 3 done bq. Findbugs warning? Failed to stage change, so it didn't make it into patch, should be there now. > Implement a FairOrderingPolicy > -- > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3319.13.patch, YARN-3319.14.patch, > YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, > YARN-3319.45.patch, YARN-3319.47.patch, YARN-3319.53.patch, > YARN-3319.58.patch, YARN-3319.70.patch, YARN-3319.71.patch, > YARN-3319.72.patch, YARN-3319.73.patch > > > Implement a FairOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. > The Policy will offer allocations to applications in a queue in order of > least resources used, and preempt applications in reverse order (from most > resources used). This will include conditional support for sizeBasedWeight > style adjustment > Optionally, based on a conditional configuration to enable sizeBasedWeight > (default false), an adjustment to boost larger applications (to offset the > natural preference for smaller applications) will adjust the resource usage > value based on demand, dividing it by the below value: > Math.log1p(app memory demand) / Math.log(2); > In cases where the above is indeterminate (two applications are equal after > this comparison), behavior falls back to comparison based on the application > id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: output_minicluster.txt Attaching the eclipse log as a .txt > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: output_minicluster.txt > > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: (was: output_minicluster.rtf) > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: output_minicluster.txt > > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
[ https://issues.apache.org/jira/browse/YARN-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3529: - Attachment: output_minicluster.rtf Thanks [~gtCarrera9] for filing the jira! Current status: I presently am using the hbase minicluster from HBaseTestingUtility in the unit tests for YARN-3411. Right now, I have my setup working in eclipse. Attaching the eclipse log that shows that a mini hbase cluster/zookeeper/ regionservers are starting and creating tables and shutting down when I run the unit test from org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl. Some relevant code bits: {code} private static HBaseTestingUtility UTIL; @BeforeClass public static void setupBeforeClass() throws Exception { UTIL = new HBaseTestingUtility(); UTIL.startMiniCluster(); createSchema(); } @AfterClass public static void tearDownAfterClass() throws Exception { UTIL.shutdownMiniCluster(); } {code} > Add miniHBase cluster and Phoenix support to ATS v2 unit tests > -- > > Key: YARN-3529 > URL: https://issues.apache.org/jira/browse/YARN-3529 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: output_minicluster.rtf > > > After we have our HBase and Phoenix writer implementations, we may want to > find a way to set up HBase and Phoenix in our unit tests. We need to do this > integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507689#comment-14507689 ] Hadoop QA commented on YARN-3413: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 1s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727305/YARN-3413.6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 12f4df0 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7453//console | This message was automatically generated. > Node label attributes (like exclusivity) should settable via > addToClusterNodeLabels but shouldn't be changeable at runtime > -- > > Key: YARN-3413 > URL: https://issues.apache.org/jira/browse/YARN-3413 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, > YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch > > > As mentioned in : > https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. > Changing node label exclusivity and/or other attributes may not be a real use > case, and also we should support setting node label attributes whiling adding > them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507685#comment-14507685 ] Jian He commented on YARN-3522: --- - I think YARN-3287 in some sense is incompatible, since it forces user to use doAs to create the timeLineClient which is not required before. Is this ok ? I suggest adding a code comment in TimeLineClient#createTimelineClient to say caller must use doAs to create the timeLineClient - start and end event occurred in the same run() method ? {code} if(timelineClient != null) { publishApplicationAttemptEvent(timelineClient, appAttemptID.toString(), DSEvent.DS_APP_ATTEMPT_START, domainId, appSubmitterUgi); } {code} > DistributedShell uses the wrong user to put timeline data > - > > Key: YARN-3522 > URL: https://issues.apache.org/jira/browse/YARN-3522 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-3522.1.patch, YARN-3522.2.patch > > > YARN-3287 breaks the timeline access control of distributed shell. In > distributed shell AM: > {code} > if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, > YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { > // Creating the Timeline Client > timelineClient = TimelineClient.createTimelineClient(); > timelineClient.init(conf); > timelineClient.start(); > } else { > timelineClient = null; > LOG.warn("Timeline service is not enabled"); > } > {code} > {code} > ugi.doAs(new PrivilegedExceptionAction() { > @Override > public TimelinePutResponse run() throws Exception { > return timelineClient.putEntities(entity); > } > }); > {code} > YARN-3287 changes the timeline client to get the right ugi at serviceInit, > but DS AM still doesn't use submitter ugi to init timeline client, but use > the ugi for each put entity call. It result in the wrong user of the put > request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3413: - Attachment: YARN-3413.6.patch Fixed trivial whitespace checks. (Ver.6) > Node label attributes (like exclusivity) should settable via > addToClusterNodeLabels but shouldn't be changeable at runtime > -- > > Key: YARN-3413 > URL: https://issues.apache.org/jira/browse/YARN-3413 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, > YARN-3413.4.patch, YARN-3413.5.patch, YARN-3413.6.patch > > > As mentioned in : > https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. > Changing node label exclusivity and/or other attributes may not be a real use > case, and also we should support setting node label attributes whiling adding > them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507657#comment-14507657 ] Sidharta Seethana commented on YARN-3366: - Hi [~vinodkv], {quote} conf.get("hadoop.tmp.dir"): We should write to the nmPrivate directories instead of /tmp. {quote} Digging in this further, it turns out that the change is far from trivial because of the way initialization works in the node manager today. I filed a separate JIRA to track this : https://issues.apache.org/jira/browse/YARN-3531 . I'll update the patch based on the rest of the feedback as discussed above. thanks > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3366.001.patch, YARN-3366.002.patch, > YARN-3366.003.patch, YARN-3366.004.patch, YARN-3366.005.patch, > YARN-3366.006.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly
[ https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507656#comment-14507656 ] Xuan Gong commented on YARN-2605: - Uploaded a new patch, and verified in a single node HA cluster. > [RM HA] Rest api endpoints doing redirect incorrectly > - > > Key: YARN-2605 > URL: https://issues.apache.org/jira/browse/YARN-2605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: bc Wong >Assignee: Xuan Gong > Labels: newbie > Attachments: YARN-2605.1.patch, YARN-2605.2.patch > > > The standby RM's webui tries to do a redirect via meta-refresh. That is fine > for pages designed to be viewed by web browsers. But the API endpoints > shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd > suggest HTTP 303, or return a well-defined error message (json or xml) > stating that the standby status and a link to the active RM. > The standby RM is returning this today: > {noformat} > $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Thu, 25 Sep 2014 18:34:53 GMT > Date: Thu, 25 Sep 2014 18:34:53 GMT > Pragma: no-cache > Expires: Thu, 25 Sep 2014 18:34:53 GMT > Date: Thu, 25 Sep 2014 18:34:53 GMT > Pragma: no-cache > Content-Type: text/plain; charset=UTF-8 > Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics > Content-Length: 117 > Server: Jetty(6.1.26) > This is standby RM. Redirecting to the current active RM: > http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly
[ https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2605: Attachment: YARN-2605.2.patch > [RM HA] Rest api endpoints doing redirect incorrectly > - > > Key: YARN-2605 > URL: https://issues.apache.org/jira/browse/YARN-2605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: bc Wong >Assignee: Xuan Gong > Labels: newbie > Attachments: YARN-2605.1.patch, YARN-2605.2.patch > > > The standby RM's webui tries to do a redirect via meta-refresh. That is fine > for pages designed to be viewed by web browsers. But the API endpoints > shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd > suggest HTTP 303, or return a well-defined error message (json or xml) > stating that the standby status and a link to the active RM. > The standby RM is returning this today: > {noformat} > $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Thu, 25 Sep 2014 18:34:53 GMT > Date: Thu, 25 Sep 2014 18:34:53 GMT > Pragma: no-cache > Expires: Thu, 25 Sep 2014 18:34:53 GMT > Date: Thu, 25 Sep 2014 18:34:53 GMT > Pragma: no-cache > Content-Type: text/plain; charset=UTF-8 > Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics > Content-Length: 117 > Server: Jetty(6.1.26) > This is standby RM. Redirecting to the current active RM: > http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3531) Make "good" local directories available to ContainerExecutors at initialization time
Sidharta Seethana created YARN-3531: --- Summary: Make "good" local directories available to ContainerExecutors at initialization time Key: YARN-3531 URL: https://issues.apache.org/jira/browse/YARN-3531 Project: Hadoop YARN Issue Type: Improvement Reporter: Sidharta Seethana Currently, in the NodeManager's serviceInit() function, the configured executor is initialized before the node health checker/directory handler service are initialized. There are use cases where executor initialization requires access to 'good' local directories ( e.g for creation of temporary files , see YARN-3366 ). We need to figure out a way to make this possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3522) DistributedShell uses the wrong user to put timeline data
[ https://issues.apache.org/jira/browse/YARN-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507641#comment-14507641 ] Hadoop QA commented on YARN-3522: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 31s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 46s | The applied patch generated 2 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 58s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 6m 43s | Tests failed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 1m 56s | Tests passed in hadoop-yarn-common. | | | | 52m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.applications.distributedshell.TestDistributedShell | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12727290/YARN-3522.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1f4767c | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7451/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/7451/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/7451/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7451/testReport/ | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7451//console | This message was automatically generated. > DistributedShell uses the wrong user to put timeline data > - > > Key: YARN-3522 > URL: https://issues.apache.org/jira/browse/YARN-3522 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Blocker > Attachments: YARN-3522.1.patch, YARN-3522.2.patch > > > YARN-3287 breaks the timeline access control of distributed shell. In > distributed shell AM: > {code} > if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, > YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { > // Creating the Timeline Client > timelineClient = TimelineClient.createTimelineClient(); > timelineClient.init(conf); > timelineClient.start(); > } else { > timelineClient = null; > LOG.warn("Timeline service is not enabled"); > } > {code} > {code} > ugi.doAs(new PrivilegedExceptionAction() { > @Override > public TimelinePutResponse run() throws Exception { > return timelineClient.putEntities(entity); > } > }); > {code} > YARN-3287 changes the timeline client to get the right ugi at serviceInit, > but DS AM still doesn't use submitter ugi to init timeline client, but use > the ugi for each put entity call. It result in the wrong user of the put > request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.
[ https://issues.apache.org/jira/browse/YARN-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3530: -- Component/s: (was: yarn) timelineserver Priority: Critical (was: Blocker) Target Version/s: 2.8.0 > ATS throws exception on trying to filter results without otherinfo. > --- > > Key: YARN-3530 > URL: https://issues.apache.org/jira/browse/YARN-3530 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Reporter: Sreenath Somarajapuram >Assignee: Zhijie Shen >Priority: Critical > > Scenario: > Am attempting to make data loading faster by fetching otherinfo on demand. As > shown in the attached image, the patch adds a 'Load Counters' checkbox. It > would be disabled by default, and on clicking, the counter data also would be > loaded. > Issue: > Things are good when otherinfo is loaded. > But ATS throws exception on trying to filter on status or applicationId > without otherinfo in fields list. > In other words, using fields=events,primaryfilters with > secondaryFilter=status:RUNNING will return > { "exception": "WebApplicationException", "message": > "java.lang.NullPointerException", "javaClassName": > "javax.ws.rs.WebApplicationException" } > from the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.
[ https://issues.apache.org/jira/browse/YARN-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned YARN-3530: - Assignee: Zhijie Shen > ATS throws exception on trying to filter results without otherinfo. > --- > > Key: YARN-3530 > URL: https://issues.apache.org/jira/browse/YARN-3530 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Sreenath Somarajapuram >Assignee: Zhijie Shen >Priority: Blocker > > Scenario: > Am attempting to make data loading faster by fetching otherinfo on demand. As > shown in the attached image, the patch adds a 'Load Counters' checkbox. It > would be disabled by default, and on clicking, the counter data also would be > loaded. > Issue: > Things are good when otherinfo is loaded. > But ATS throws exception on trying to filter on status or applicationId > without otherinfo in fields list. > In other words, using fields=events,primaryfilters with > secondaryFilter=status:RUNNING will return > { "exception": "WebApplicationException", "message": > "java.lang.NullPointerException", "javaClassName": > "javax.ws.rs.WebApplicationException" } > from the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3530) ATS throws exception on trying to filter results without otherinfo.
Sreenath Somarajapuram created YARN-3530: Summary: ATS throws exception on trying to filter results without otherinfo. Key: YARN-3530 URL: https://issues.apache.org/jira/browse/YARN-3530 Project: Hadoop YARN Issue Type: Bug Components: yarn Reporter: Sreenath Somarajapuram Priority: Blocker Scenario: Am attempting to make data loading faster by fetching otherinfo on demand. As shown in the attached image, the patch adds a 'Load Counters' checkbox. It would be disabled by default, and on clicking, the counter data also would be loaded. Issue: Things are good when otherinfo is loaded. But ATS throws exception on trying to filter on status or applicationId without otherinfo in fields list. In other words, using fields=events,primaryfilters with secondaryFilter=status:RUNNING will return { "exception": "WebApplicationException", "message": "java.lang.NullPointerException", "javaClassName": "javax.ws.rs.WebApplicationException" } from the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3529) Add miniHBase cluster and Phoenix support to ATS v2 unit tests
Li Lu created YARN-3529: --- Summary: Add miniHBase cluster and Phoenix support to ATS v2 unit tests Key: YARN-3529 URL: https://issues.apache.org/jira/browse/YARN-3529 Project: Hadoop YARN Issue Type: Sub-task Reporter: Li Lu Assignee: Li Lu After we have our HBase and Phoenix writer implementations, we may want to find a way to set up HBase and Phoenix in our unit tests. We need to do this integration before the branch got merged back to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3413) Node label attributes (like exclusivity) should settable via addToClusterNodeLabels but shouldn't be changeable at runtime
[ https://issues.apache.org/jira/browse/YARN-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507576#comment-14507576 ] Wangda Tan commented on YARN-3413: -- Commented on HADOOP-11746: https://issues.apache.org/jira/browse/HADOOP-11746?focusedCommentId=14507573&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14507573 as well. > Node label attributes (like exclusivity) should settable via > addToClusterNodeLabels but shouldn't be changeable at runtime > -- > > Key: YARN-3413 > URL: https://issues.apache.org/jira/browse/YARN-3413 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3413.1.patch, YARN-3413.2.patch, YARN-3413.3.patch, > YARN-3413.4.patch, YARN-3413.5.patch > > > As mentioned in : > https://issues.apache.org/jira/browse/YARN-3345?focusedCommentId=14384947&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14384947. > Changing node label exclusivity and/or other attributes may not be a real use > case, and also we should support setting node label attributes whiling adding > them to cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)