[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-09-30 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938288#comment-14938288
 ] 

Naganarasimha G R commented on YARN-4162:
-

Thanks for the approach [~wangda], seems like the approach you mentioned is 
fine, but only problem is same objects is used across UI and the REST, 
hopefully  they wont any impacts ! I have started the modifications will update 
it by tomorrow

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-09-30 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-4185:

Assignee: Neelesh Srinivas Salian  (was: Anubhav Dhoot)

> Retry interval delay for NM client can be improved from the fixed static 
> retry 
> ---
>
> Key: YARN-4185
> URL: https://issues.apache.org/jira/browse/YARN-4185
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>
> Instead of having a fixed retry interval that starts off very high and stays 
> there, we are better off using an exponential backoff that has the same fixed 
> max limit. Today the retry interval is fixed at 10 sec that can be 
> unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938185#comment-14938185
 ] 

Li Lu commented on YARN-4210:
-

+1. 

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-09-30 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938225#comment-14938225
 ] 

Anubhav Dhoot commented on YARN-3996:
-

SchedulerUtils has multiple overloads of normalizeRequests. The ones that take 
in increment and min handle what you are looking to do. 
Fair has support for the increment and Fifo/Capacity do not. So Fair does 
multiple of increment and uses min for min while Fifo/Capacity does multiple of 
min and use min for min. Basically Capacity/Fifo are setting incr also to min. 
We need to do the same in RMAppManager.
That way Fair can continue supporting zero min and use multiple of incr and 
Fifo/Capacity can choose to not support zero min and support multiple of min. 

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-30 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938278#comment-14938278
 ] 

Naganarasimha G R commented on YARN-3964:
-

Thanks for the patch [~dian.fu],
bq. I think it should be fine to me if we solved the locking issue.
By this comment, IIUC [~wangda] meant about the issue which i mentioned. And as 
per the approach discussed it would be better to be handled in 
RMNodeLabelsManager (for which i am coming up with a jira and patch) so that it 
solves issue in all types of Node label configuration (centralized , 
distributed  & this). But still feel the approach in the new patch is better as 
it would be holding less locks on RMNodeLabelsManager and will  try to club and 
fetch multiple requests at one shot!
Few comments :
# Synchronizations in RMDelegatedNodeLabelsUpdaterTimerTask is not proper. {{ 
synchronized(this)}} holds lock on RMDelegatedNodeLabelsUpdaterTimerTask 
instance but newlyRegisteredNodes is updated in {{updateNodeLabels}} with the 
lock on RMDelegatedNodeLabelsUpdater instance.
# {{nodesToUpdateLabels == null}} is req in the below code ?
{code}
if (nodesToUpdateLabels == null && !newlyRegisteredNodes.isEmpty()) {
synchronized(this) {
  if (!newlyRegisteredNodes.isEmpty()) {
nodesToUpdateLabels = new 
HashSet(newlyRegisteredNodes);
  }
}
  }
{code} 
{{nodesToUpdateLabels == null}} is req ?
# 5 Seconds span i feel is little too much, better to have 30 seconds, if the 
provider is doing some operation which takes more than 5 seconds then multiple 
tasks can pile up.

Hope you can share your test code for the RMNodeLabelMappingsUpdater with which 
it i can test, hope you also have verified it .

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-09-30 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4209:

Attachment: YARN-4209.002.patch

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938182#comment-14938182
 ] 

Li Lu commented on YARN-4203:
-

+1, patch LGTM. Please feel free to commit when there's no objections. Thanks 
for the work [~varun_saxena]! 

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.003.patch, 
> YARN-4203-YARN-2928.01.patch, YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938212#comment-14938212
 ] 

Vrushali C commented on YARN-4203:
--

+1 on patch v3. Will commit it in now. 

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.003.patch, 
> YARN-4203-YARN-2928.01.patch, YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938099#comment-14938099
 ] 

Hudson commented on YARN-3727:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #439 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/439/])
YARN-3727. For better error recovery, check if the directory exists (jlowe: rev 
854d25b0c30fd40f640c052e79a8747741492042)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-09-30 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938190#comment-14938190
 ] 

Anubhav Dhoot commented on YARN-4185:
-

can we try to reuse the existing values for retries 
(yarn.client.nodemanager-connect. ) and see if we can be mostly compatible? I 
am thinking its fine if its not exactly the same behavior

> Retry interval delay for NM client can be improved from the fixed static 
> retry 
> ---
>
> Key: YARN-4185
> URL: https://issues.apache.org/jira/browse/YARN-4185
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> Instead of having a fixed retry interval that starts off very high and stays 
> there, we are better off using an exponential backoff that has the same fixed 
> max limit. Today the retry interval is fixed at 10 sec that can be 
> unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1897) CLI and core support for signal container functionality

2015-09-30 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937953#comment-14937953
 ] 

Xuan Gong commented on YARN-1897:
-

+1 lgtm. Let us wait for several days. If there are no other comments, I will 
commit it.

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897-7.patch, YARN-1897-8.patch, 
> YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938217#comment-14938217
 ] 

Vrushali C commented on YARN-4210:
--

+ 1 from me too on patch v3. Will commit this in after committing in YARN-4203. 

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4169) jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels

2015-09-30 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938286#comment-14938286
 ] 

Naganarasimha G R commented on YARN-4169:
-

Hi [~steve_l] & [~rohithsharma],
Any thoughts about the approach and the issue being discussed in the earlier 
comment  

> jenkins trunk+java build failed in TestNodeStatusUpdaterForLabels
> -
>
> Key: YARN-4169
> URL: https://issues.apache.org/jira/browse/YARN-4169
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
> Environment: Jenkins
>Reporter: Steve Loughran
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: YARN-4169.v1.001.patch, YARN-4169.v1.002.patch, 
> YARN-4169.v1.003.patch
>
>
> Test failing in [[Jenkins build 
> 402|https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-Yarn-trunk-Java8/402/testReport/junit/org.apache.hadoop.yarn.server.nodemanager/TestNodeStatusUpdaterForLabels/testNodeStatusUpdaterForNodeLabels/]
> {code}
> java.lang.NullPointerException: null
>   at java.util.HashSet.(HashSet.java:118)
>   at 
> org.apache.hadoop.yarn.nodelabels.NodeLabelTestBase.assertNLCollectionEquals(NodeLabelTestBase.java:103)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels.testNodeStatusUpdaterForNodeLabels(TestNodeStatusUpdaterForLabels.java:268)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938170#comment-14938170
 ] 

zhihai xu commented on YARN-3727:
-

thanks [~jlowe] for the review and committing the patch! thanks [~lichangleo] 
and [~sjlee0] for the review!

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1897) CLI and core support for signal container functionality

2015-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936504#comment-14936504
 ] 

Hadoop QA commented on YARN-1897:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  23m 31s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:red}-1{color} | javac |   8m 13s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |  10m 30s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m  9s | The applied patch generated  2 
new checkstyle issues (total was 32, now 34). |
| {color:green}+1{color} | whitespace |   2m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   8m 44s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests |  67m 14s | Tests failed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 31s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   6m 56s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  8s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 30s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   8m 52s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:green}+1{color} | yarn tests |  62m  8s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 208m  1s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
| Failed unit tests | hadoop.mapred.TestNetworkedJob |
|   | hadoop.mapred.TestJobCounters |
|   | hadoop.mapred.TestYARNRunner |
|   | hadoop.mapred.TestClientServiceDelegate |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
| Timed out tests | 
org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter |
|   | org.apache.hadoop.mapred.TestMiniMRChildTask |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764373/YARN-1897-8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 06abc57 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/artifact/patchprocess/diffJavacWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9305/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9305/console |


This message was automatically generated.

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> 

[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-09-30 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936459#comment-14936459
 ] 

Naganarasimha G R commented on YARN-4162:
-

Hi [~wangda],
   Thanks for the comments, One basic doubt, is it ok to change the REST 
structure in terms of compatability ? Based on this i gave a new interface.

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-30 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4203:
---
Attachment: YARN-4203-YARN-2928.03.patch

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.01.patch, 
> YARN-4203-YARN-2928.02.patch, YARN-4203-YARN-2928.03.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-09-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936465#comment-14936465
 ] 

Wangda Tan commented on YARN-4162:
--

[~Naganarasimha], adding new fields is fine, but we should avoid 
removing/modify existing fields. So in my example above, I kept existing fields 
untouched. 

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936477#comment-14936477
 ] 

Hadoop QA commented on YARN-3964:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  21m 29s | Pre-patch trunk compilation is 
healthy. |
| {color:red}-1{color} | @author |   0m  0s | The patch appears to contain 1 
@author tags which the Hadoop  community has agreed to not allow in code 
contributions. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:green}+1{color} | javac |   8m 19s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  13m 12s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 14s | The applied patch generated  1 
new checkstyle issues (total was 211, now 211). |
| {color:green}+1{color} | whitespace |   0m  6s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m  1s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 45s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m 19s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 34s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   2m 28s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  59m 53s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 118m 33s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764377/YARN-3964.013.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 06abc57 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9306/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9306/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9306/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9306/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9306/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9306/console |


This message was automatically generated.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-09-30 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936499#comment-14936499
 ] 

zhihai xu commented on YARN-4209:
-

Hi [~rohithsharma], I uploaded a new patch YARN-4209.001.patch, which uses 
MultipleArcTransition. Create private function 
{{notifyStoreOperationFailedInternal}}, now {{notifyStoreOperationFailed}} will 
only be called by {{ZKRMStateStore#VerifyActiveStatusThread}}.
So I acquire {{writeLock}} and check {{isFencedState}} in 
{{notifyStoreOperationFailed}} to make sure {{handleTransitionToStandBy}} is 
only called once. Please review it, thanks.

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4212) FairScheduler: Parent queues with 'Fair' policy should compute shares of all resources for its children during a recompute

2015-09-30 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-4212:
-

 Summary: FairScheduler: Parent queues with 'Fair' policy should 
compute shares of all resources for its children during a recompute
 Key: YARN-4212
 URL: https://issues.apache.org/jira/browse/YARN-4212
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun Suresh
Assignee: Arun Suresh


The Fair Scheduler, while performing a {{recomputeShares()}} during an 
{{update()}} call, uses the parent queues policy to distribute shares to its 
children.

If the parent queues policy is 'fair', it only computes weight for memory and 
sets the vcores fair share of its children to 0.

Assuming a situation where we have 1 parent queue with policy 'fair' and 
multiple leaf queues with policy 'drf', Any app submitted to the child queues 
with vcore requirement > 1 will always be above fairshare, since during the 
recomputeShare process, the child queues were all assigned 0 for fairshare 
vcores.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936413#comment-14936413
 ] 

Li Lu commented on YARN-4210:
-

Of course we should net only report user. I believe detailed user information 
is also important when the URL does not contain an user parameter. I'm fine 
with a fix in YARN-4203 though. Thanks for the work! 

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-09-30 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4209:

Attachment: YARN-4209.001.patch

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-09-30 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4209:

Attachment: (was: YARN-4209.001.patch)

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-30 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936495#comment-14936495
 ] 

Varun Saxena commented on YARN-4210:


Have updated a patch on YARN-4203.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-30 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936494#comment-14936494
 ] 

Varun Saxena commented on YARN-4203:


Updated patch as per discussion on YARN-4210 
https://issues.apache.org/jira/browse/YARN-4210?focusedCommentId=14936407=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14936407

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.01.patch, 
> YARN-4203-YARN-2928.02.patch, YARN-4203-YARN-2928.03.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-09-30 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-4209:

Attachment: YARN-4209.001.patch

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-09-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936444#comment-14936444
 ] 

Wangda Tan commented on YARN-4162:
--

Hi [~Naganarasimha],

Thanks for working on this, I have some comments about API design:

I found you added a new interface: {{@Path("/scheduler/partitions")}}.

I'm wondering instead of doing this, is it possible to change existing REST 
API? For example: {{ws/v1/cluster/scheduler}}. We have two major places need to 
change:
- Capacity, all different capacity-by-partitions.
- Resource, all resources-by-partitions.

The API in my mind looks like:
{code}
"queue": [
  {
< Existing fields >
"-xsi:type": "capacitySchedulerLeafQueueInfo",
"capacity": "50.0",
"usedCapacity": "0.0",
"maxCapacity": "100.0",
"absoluteCapacity": "50.0",
"absoluteMaxCapacity": "100.0",
"absoluteUsedCapacity": "0.0",
"numApplications": "2",
"queueName": "a",
"state": "RUNNING",
"resourcesUsed": {
  "memory": "0",
  "vCores": "0"
},

-- New Added Fields --
"capacities": {
"DEFAULT_PARTITION": {

}
"label-x" : {
"capacity": 50.0,
"usedCapacity": 40.0,
"maxCapacity": ...
}
},

"resources": {
"DEFAULT_PARTITION": {
"used": {
"memory": 0,
"vCores": 0
},
"reserved": {
"memory": 0,
"vCores": 0
}
},
"label-x": {

}
}
  }
]
{code}

And also for user:
{code}
...
"users": {
  "user": {
"username": "wtan",
"resourcesUsed": {
  "memory": "0",
  "vCores": "0"
},

-- New Added Fields --
"resources": {
"DEFAULT_PARTITION": {
"used": {
"memory": 0,
"vCores": 0
},
"reserved": {
"memory": 0,
"vCores": 0
}
},
"label-x": {

}
}
  }
}
{code}

I think we may need to consider add something like QueueCapacitiesInfo and 
ResourceUsageInfo, which could be converted from a give 
QueueCapacities/ResourceUsage.

Thoughts?

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-30 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936407#comment-14936407
 ] 

Varun Saxena commented on YARN-4210:


[~gtCarrera9],
Entity may not be found for reasons other than user as well. So printing only 
user doesnt seem complete to me.
But from a debugging point of view this info should be available somewhere.
In YARN-4203 we will be adding request/response logging. Maybe I can print 
request user while printing "Received URL " message. If URL does not 
contain user we can assume while debugging that request user was taken.
Thoughts ?

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-09-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936490#comment-14936490
 ] 

Wangda Tan commented on YARN-3216:
--

O(n) -> {{O\(n\)}}, the sticker syntax is too unfriendly with computer science 
:).

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-09-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936488#comment-14936488
 ] 

Wangda Tan commented on YARN-3216:
--

Hi [~sunilg],
Thanks for working on this, not finished, some comments so far:
1) (minor) Could you make amlimit computation treat label=="" as a normal 
label? Which could simplify logic. You can use a map to store computed 
amlimit-by-partition to avoid duplicated computation.
2) (major) getAMResourceLimitPerPartition should uses partition.totalResource 
(from RMNodeLabelsManager.getPartitionResource) instead of clusterResource.
3) (minor) ResourceUsage#getAllAMUsed is not used.
4) (major) LeafQueue#getNumActiveAppsPerPartition is a O(n) operation, should 
be optimized, otherwise activateApplication becomes a O(n^2) operation.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-09-30 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936537#comment-14936537
 ] 

Rohith Sharma K S commented on YARN-4209:
-

Overall patch looks good. Some comments 
# The below method can be extracted to method?
{code}
if (isFenced) {
return RMStateStoreState.FENCED;
} else {
return RMStateStoreState.ACTIVE;
}
{code}
# In the method {{notifyStoreOperationFailed}}, I think no need to obtain write 
lock since {{updateFencedState}} is synchronous call, just before state 
transition write lock is obtained which is at lower level. Does it really 
require? 

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-09-30 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3864:
---
Summary: Implement support for querying single app and all apps for a flow 
run  (was: TimelineReader API for aggregated entities)

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Once aggregations are implemented at the writer side, we need to design  
> APIs' to query them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-30 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4210:
---
Description: 
This JIRA handles multiple issues.

* If HBase Get does not fetch any rows for the query, we still try to parse the 
result and read fields. This leads to NPE while reading metrics. We should not 
attempt to read anything if no row is returned i.e. result is empty.
Found during web UI poc testing. 
{noformat}
2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:handleException(199)) - Error while processing 
REST request
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)

{noformat}

* ResultScanner is not closed in HBase Reader.

* Exception 

  was:
If HBase Get does not fetch any rows for the query, we still try to parse the 
result and read fields. This leads to NPE while reading metrics. We should not 
attempt to read anything if no row is returned i.e. result is empty.
Found during web UI poc testing. 
{noformat}
2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:handleException(199)) - Error while processing 
REST request
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)

{noformat}


> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> This JIRA handles multiple issues.
> * If HBase Get does not fetch any rows for the query, we still try to parse 
> the result and read fields. This leads to NPE while reading metrics. We 
> should not attempt to read anything if no row is returned i.e. result is 
> empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> 

[jira] [Updated] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-30 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4210:
---
Description: 
This JIRA handles multiple issues.

* If HBase Get does not fetch any rows for the query, we still try to parse the 
result and read fields. This leads to NPE while reading metrics. We should not 
attempt to read anything if no row is returned i.e. result is empty.
Found during web UI poc testing. 
{noformat}
2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:handleException(199)) - Error while processing 
REST request
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)

{noformat}

* ResultScanner is not closed in HBase Reader.

* Exception encountered while reading start and end time in FlowRunEntityReader

  was:
This JIRA handles multiple issues.

* If HBase Get does not fetch any rows for the query, we still try to parse the 
result and read fields. This leads to NPE while reading metrics. We should not 
attempt to read anything if no row is returned i.e. result is empty.
Found during web UI poc testing. 
{noformat}
2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:handleException(199)) - Error while processing 
REST request
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)

{noformat}

* ResultScanner is not closed in HBase Reader.

* Exception 


> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> This JIRA handles multiple issues.
> * If HBase Get does not fetch any rows for the query, we still try to parse 
> the result and read fields. This leads to NPE while reading metrics. We 
> should not attempt to read anything if no row is returned i.e. result is 
> empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> 

[jira] [Commented] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938759#comment-14938759
 ] 

Vrushali C commented on YARN-4203:
--

Committed patch 003 in. Thanks [~varun_saxena] for the patch and [~gtCarrera] 
for the review.

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.003.patch, 
> YARN-4203-YARN-2928.01.patch, YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-30 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4075:
--
 Hadoop Flags: Reviewed
Fix Version/s: YARN-2928

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Fix For: YARN-2928
>
> Attachments: YARN-4075-YARN-2928.03.patch, 
> YARN-4075-YARN-2928.04.patch, YARN-4075-YARN-2928.05.patch, 
> YARN-4075-YARN-2928.POC.1.patch, YARN-4075-YARN-2928.POC.2.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938858#comment-14938858
 ] 

Vrushali C commented on YARN-4210:
--

Committed patch v3 in, thanks [~varun_saxena] for the jira and the patch and 
[~gtCarrera], [~sjlee0] for the reviews.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Fix For: YARN-2928
>
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> This JIRA handles multiple issues.
> * If HBase Get does not fetch any rows for the query, we still try to parse 
> the result and read fields. This leads to NPE while reading metrics. We 
> should not attempt to read anything if no row is returned i.e. result is 
> empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}
> * ResultScanner is not closed in HBase Reader.
> * Exception encountered while reading start and end time in 
> FlowRunEntityReader



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) TimelineReader API for aggregated entities

2015-09-30 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938769#comment-14938769
 ] 

Varun Saxena commented on YARN-3864:


Yes. Currently working on it. Will update the title.

> TimelineReader API for aggregated entities
> --
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Once aggregations are implemented at the writer side, we need to design  
> APIs' to query them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4097) Create POC timeline web UI with new YARN web UI framework

2015-09-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938871#comment-14938871
 ] 

Vrushali C commented on YARN-4097:
--

Hi [~gtCarrera] Wondering if this jira should be reassigned to someone who is 
working on the UI PoC? Or are you working on it? Would be great to see some 
interim/in progress screenshots or anything else that you can share. 

> Create POC timeline web UI with new YARN web UI framework
> -
>
> Key: YARN-4097
> URL: https://issues.apache.org/jira/browse/YARN-4097
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>
> As planned, we need to try out the new YARN web UI framework and implement 
> timeline v2 web UI on top of it. This JIRA proposes to build the basic active 
> flow and application lists of the timeline data. We can add more content 
> after we get used to this framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) TimelineReader API for aggregated entities

2015-09-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938764#comment-14938764
 ] 

Vrushali C commented on YARN-3864:
--

Noting from the reviews in YARN-4210 by [~sjlee0] that the query that returns 
all apps for a given flow run will be done in YARN-3864.

> TimelineReader API for aggregated entities
> --
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Once aggregations are implemented at the writer side, we need to design  
> APIs' to query them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-09-30 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3864:
---
Description: 
This JIRA will handle support for querying all apps for a flow run in HBase 
reader implementation.
And also REST API implementation for single app and multiple apps.

  was:Once aggregations are implemented at the writer side, we need to design  
APIs' to query them.


> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4213) Add REST API to RM to retrieve containers info from an application attempt

2015-09-30 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4213:


 Summary: Add REST API to RM to retrieve containers info from an 
application attempt
 Key: YARN-4213
 URL: https://issues.apache.org/jira/browse/YARN-4213
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan


In existing RM web UI and AHS REST API, containers information from an 
appattempt could be retrieved, RM web REST API should be able to do this as 
well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4097) Create POC timeline web UI with new YARN web UI framework

2015-09-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938877#comment-14938877
 ] 

Li Lu commented on YARN-4097:
-

Yes I'm working on this right now. Currently I've added a page for listing all 
past active flows in the system, and flowruns inside a flow. I think it's a 
good time for some discussion about the progress and next step plans. 

> Create POC timeline web UI with new YARN web UI framework
> -
>
> Key: YARN-4097
> URL: https://issues.apache.org/jira/browse/YARN-4097
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>
> As planned, we need to try out the new YARN web UI framework and implement 
> timeline v2 web UI on top of it. This JIRA proposes to build the basic active 
> flow and application lists of the timeline data. We can add more content 
> after we get used to this framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-30 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4203:
--
 Hadoop Flags: Reviewed
Fix Version/s: YARN-2928
  Description: 
The rest endpoints are being added as part of YARN-4075. Filing this jira to 
add in request & response logging and timing for each REST call that comes in. 



  was:

The rest endpoints are being added as part of YARN-4075. Filing this jira to 
add in request & response logging and timing for each REST call that comes in. 




> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Fix For: YARN-2928
>
> Attachments: YARN-4203-YARN-2928.003.patch, 
> YARN-4203-YARN-2928.01.patch, YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-09-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938775#comment-14938775
 ] 

Vrushali C commented on YARN-3864:
--

Thank you! 

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4162) Scheduler info in REST, is currently not displaying partition specific queue information similar to UI

2015-09-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938791#comment-14938791
 ] 

Wangda Tan commented on YARN-4162:
--

Thanks! [~Naganarasimha].

> Scheduler info in REST, is currently not displaying partition specific queue 
> information similar to UI
> --
>
> Key: YARN-4162
> URL: https://issues.apache.org/jira/browse/YARN-4162
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4162.v1.001.patch
>
>
> When Node Labels are enabled then REST Scheduler Information should also 
> provide partition specific queue information similar to the existing Web UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936752#comment-14936752
 ] 

Hadoop QA commented on YARN-4209:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 30s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   7m 45s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |  10m  0s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 47s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 27s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  55m 59s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  94m 57s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764394/YARN-4209.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 06abc57 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/9308/artifact/patchprocess/diffJavacWarnings.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9308/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9308/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9308/console |


This message was automatically generated.

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-30 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4203:
---
Attachment: (was: YARN-4203-YARN-2928.03.patch)

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.01.patch, 
> YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-30 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4203:
---
Attachment: YARN-4203-YARN-2928.003.patch

Results of previous build are weird. Adding the same patch again to invoke build

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.003.patch, 
> YARN-4203-YARN-2928.01.patch, YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936837#comment-14936837
 ] 

Hadoop QA commented on YARN-4203:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 39s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  6s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 51s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 23s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  40m 53s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764416/YARN-4203-YARN-2928.003.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / def22b9 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9309/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9309/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9309/console |


This message was automatically generated.

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.003.patch, 
> YARN-4203-YARN-2928.01.patch, YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4175) Example of use YARN-1197

2015-09-30 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938919#comment-14938919
 ] 

MENG DING commented on YARN-4175:
-

Update on the progress of this ticket:

The example will be based on the existing DistributedShell application. The 
idea is to add an RPC service to the DistributedShell application master, and 
also a client to issue requests to this service to increase/decrease container 
resources after the application is started.

The patch is almost ready and under testing. Will post it for review soon.

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4213) Add REST API to RM to retrieve containers info from an application attempt

2015-09-30 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4213:
-
Attachment: YARN-4213.1.patch

Attached ver.1.

> Add REST API to RM to retrieve containers info from an application attempt
> --
>
> Key: YARN-4213
> URL: https://issues.apache.org/jira/browse/YARN-4213
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4213.1.patch
>
>
> In existing RM web UI and AHS REST API, containers information from an 
> appattempt could be retrieved, RM web REST API should be able to do this as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4213) Add REST API to RM to retrieve containers info from an application attempt

2015-09-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939002#comment-14939002
 ] 

Wangda Tan commented on YARN-4213:
--

Uploaded patch, tested locally, it can get containers list from RM REST API.

> Add REST API to RM to retrieve containers info from an application attempt
> --
>
> Key: YARN-4213
> URL: https://issues.apache.org/jira/browse/YARN-4213
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4213.1.patch
>
>
> In existing RM web UI and AHS REST API, containers information from an 
> appattempt could be retrieved, RM web REST API should be able to do this as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4214) AppAttemptInfo should have ApplicationAttemptId

2015-09-30 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-4214:
-
Attachment: YARN-4214.1.patch

Attached ver.1 patch, it's a simple fix.

> AppAttemptInfo should have ApplicationAttemptId
> ---
>
> Key: YARN-4214
> URL: https://issues.apache.org/jira/browse/YARN-4214
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4214.1.patch
>
>
> Currently YARN RM REST API 
> {{ws/v1/cluster/apps/application_1443559871354_0008/appattempts}} only 
> returns int id for each attempts. 
> Such as:
> {code}
> "appAttempt": [
>   {
> "id": "1",
> "startTime": "1443645213960",
> "containerId": "container_1443559871354_0008_01_01",
> "nodeHttpAddress": "localhost:8042",
> "nodeId": "localhost:62978",
> "logsLink": "
> http://localhost:8042/node/containerlogs/container_1443559871354_0008_01_01/wtan
> "
>   },
> {code}
> It's better to have a string ApplicationAttemptId like: 
> {{appattempt_1443649107010_0001_01}} in REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4213) Add REST API to RM to retrieve containers info from an application attempt

2015-09-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939048#comment-14939048
 ] 

Li Lu commented on YARN-4213:
-

Tested locally and it works for me. The removed code is a duplication to the 
one in WebServics.java. LGTM. 

> Add REST API to RM to retrieve containers info from an application attempt
> --
>
> Key: YARN-4213
> URL: https://issues.apache.org/jira/browse/YARN-4213
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4213.1.patch
>
>
> In existing RM web UI and AHS REST API, containers information from an 
> appattempt could be retrieved, RM web REST API should be able to do this as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4213) Add REST API to RM to retrieve containers info from an application attempt

2015-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939111#comment-14939111
 ] 

Hadoop QA commented on YARN-4213:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |  10m 28s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 45s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 51s | The applied patch generated  1 
new checkstyle issues (total was 40, now 41). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  59m 48s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 106m 34s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764507/YARN-4213.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c17d31 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9314/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9314/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9314/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9314/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9314/console |


This message was automatically generated.

> Add REST API to RM to retrieve containers info from an application attempt
> --
>
> Key: YARN-4213
> URL: https://issues.apache.org/jira/browse/YARN-4213
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4213.1.patch
>
>
> In existing RM web UI and AHS REST API, containers information from an 
> appattempt could be retrieved, RM web REST API should be able to do this as 
> well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1897) CLI and core support for signal container functionality

2015-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938922#comment-14938922
 ] 

Hadoop QA commented on YARN-1897:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  24m 39s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 10 new or modified test files. |
| {color:red}-1{color} | javac |   8m 31s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |  10m 53s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 16s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   3m 29s | The applied patch generated  2 
new checkstyle issues (total was 32, now 34). |
| {color:green}+1{color} | whitespace |   2m  2s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   9m 11s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | mapreduce tests | 106m 16s | Tests failed in 
hadoop-mapreduce-client-jobclient. |
| {color:green}+1{color} | yarn tests |   0m 29s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   7m  3s | Tests failed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m 26s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 37s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |   8m 59s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  56m 38s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 244m 45s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-nodemanager |
| Failed unit tests | hadoop.mapred.TestNetworkedJob |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
|   | 
hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService
 |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
| Timed out tests | org.apache.hadoop.mapred.TestClusterMapReduceTestCase |
|   | 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764373/YARN-1897-8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c17d31 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/diffJavacWarnings.txt
 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9312/console |


This message was automatically generated.

> CLI and core support for 

[jira] [Created] (YARN-4214) AppAttemptInfo should have ApplicationAttemptId

2015-09-30 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4214:


 Summary: AppAttemptInfo should have ApplicationAttemptId
 Key: YARN-4214
 URL: https://issues.apache.org/jira/browse/YARN-4214
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan


Currently YARN RM REST API 
{{ws/v1/cluster/apps/application_1443559871354_0008/appattempts}} only returns 
int id for each attempts. 
Such as:
{code}
"appAttempt": [
  {
"id": "1",
"startTime": "1443645213960",
"containerId": "container_1443559871354_0008_01_01",
"nodeHttpAddress": "localhost:8042",
"nodeId": "localhost:62978",
"logsLink": "
http://localhost:8042/node/containerlogs/container_1443559871354_0008_01_01/wtan
"
  },
{code}

It's better to have a string ApplicationAttemptId like: 
{{appattempt_1443649107010_0001_01}} in REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-09-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14938953#comment-14938953
 ] 

Vrushali C commented on YARN-4178:
--

Thanks [~varun_saxena] for the patch. Overall, LGTM.  A couple of observations:

It would be good to have encodeAppId and decodeAppId in the same class instead 
of two different classes. 
To that effect, if you’d like, we can rename TimelineWriterUtils to 
TimelineStorageUtils so that both reader and writer can use functions from 
this. 
Also,let’s have the invert(long) and invert(int) functions in the same util 
class, instead of adding in a new util class.

While I do think we should store the “application_” prefix (if/when yarn starts 
allowing configurable prefixes so that we can see something like 
"spark__" or "tez__" on the cluster 
etc), I don’t want to hold up the jira for that since we could add it in later 
as we see fit.


> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-09-30 Thread Joep Rottinghuis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939315#comment-14939315
 ] 

Joep Rottinghuis commented on YARN-4178:


Fine starting without it with benefit of more compact keys. If we add the 
application_ it should be post fixed in the key to ensure correct ordering.

Why do we need any util classes for this, can't an AppId class handle this by 
itself (convert from string to byte representation and back)?

Sent from my iPhone



> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-09-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939136#comment-14939136
 ] 

Li Lu commented on YARN-3864:
-

This issue becomes a major blocker for the web UI POC since we cannot list 
applications within one flowrun. Meanwhile, could anyone remind me where did we 
store this information in the HBase storage? 

As a big picture I assume there is an "applications" section in our flowrun 
endpoint's return value, which can be similar to the "flowruns" section in our 
/flows endpoint, but let's make sure we do have that information first (and 
hopefully do not need to set up coprocessors). 

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-09-30 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-4215:
---

 Summary: RMNodeLabels Manager Need to verify and replace node 
labels for the only modified Node Label Mappings in the request
 Key: YARN-4215
 URL: https://issues.apache.org/jira/browse/YARN-4215
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R


Modified node Labels needs to be updated by the capacity scheduler holding a 
lock hence its better to push events to scheduler only when there is actually a 
change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4214) AppAttemptInfo should have ApplicationAttemptId

2015-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939126#comment-14939126
 ] 

Hadoop QA commented on YARN-4214:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 10s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  4s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 48s | The applied patch generated  1 
new checkstyle issues (total was 8, now 9). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  59m 57s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m 42s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764510/YARN-4214.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / c7e03c3 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-YARN-Build/9315/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9315/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9315/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9315/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9315/console |


This message was automatically generated.

> AppAttemptInfo should have ApplicationAttemptId
> ---
>
> Key: YARN-4214
> URL: https://issues.apache.org/jira/browse/YARN-4214
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4214.1.patch
>
>
> Currently YARN RM REST API 
> {{ws/v1/cluster/apps/application_1443559871354_0008/appattempts}} only 
> returns int id for each attempts. 
> Such as:
> {code}
> "appAttempt": [
>   {
> "id": "1",
> "startTime": "1443645213960",
> "containerId": "container_1443559871354_0008_01_01",
> "nodeHttpAddress": "localhost:8042",
> "nodeId": "localhost:62978",
> "logsLink": "
> http://localhost:8042/node/containerlogs/container_1443559871354_0008_01_01/wtan
> "
>   },
> {code}
> It's better to have a string ApplicationAttemptId like: 
> {{appattempt_1443649107010_0001_01}} in REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-09-30 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3864:

Priority: Blocker  (was: Major)

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-09-30 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939324#comment-14939324
 ] 

Naganarasimha G R commented on YARN-3367:
-

Hi [~gtCarrera] & [~sjlee0],  Any thoughts on my previous comments ?

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
> Attachments: YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4129) Refactor the SystemMetricPublisher in RM to better support newer events

2015-09-30 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939321#comment-14939321
 ] 

Naganarasimha G R commented on YARN-4129:
-

hi [~gtCarrera], [~varun_saxena]  & [~sjlee0],
Hope one of you can do the initial review for the approach mentioned here in 
the absence of [~djp], So that if the basic approach is fine then will proceed 
ahead with other jira's like YARN-3880 and in parallel keep on correcting this 
jira.
 

> Refactor the SystemMetricPublisher in RM to better support newer events
> ---
>
> Key: YARN-4129
> URL: https://issues.apache.org/jira/browse/YARN-4129
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-4129.YARN-2928.001.patch
>
>
> Currently to add new timeline event/ entity in RM side, one has to add a 
> method in publisher and a method in handler and create a new event class 
> which looks cumbersome and redundant. also further all the events might not 
> be required to be published in V1 & V2. So adopting the approach similar to 
> what was adopted in YARN-3045(NM side)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs

2015-09-30 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939132#comment-14939132
 ] 

Arun Suresh commented on YARN-3985:
---

Thanks for the patch [~adhoot] and the review [~subru]

I guess it looks pretty straightforward. Minor nit though :
* 
{{TestReservationSystemWithRMHA.testSubmitReservationAndCheckAfterFailover()}} 
uses explicit Thread.sleep to wait for the plan followers to synchronize. Is it 
maybe better to explicitly call ReservationSystem.synchronizePlan with the plan 
name and not have to perform a thread sleep and reduce possible test flakiness.

+1 pending your decision on the above.

> Make ReservationSystem persist state using RMStateStore reservation APIs 
> -
>
> Key: YARN-3985
> URL: https://issues.apache.org/jira/browse/YARN-3985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3985.001.patch, YARN-3985.002.patch, 
> YARN-3985.002.patch, YARN-3985.002.patch, YARN-3985.003.patch
>
>
> YARN-3736 adds the RMStateStore apis to store and load reservation state. 
> This jira adds the actual storing of state from ReservationSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-09-30 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4215:

Component/s: (was: api)
 (was: client)

> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Attachments: YARN-4215.v1.001.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4215) RMNodeLabels Manager Need to verify and replace node labels for the only modified Node Label Mappings in the request

2015-09-30 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4215:

Attachment: YARN-4215.v1.001.patch

> RMNodeLabels Manager Need to verify and replace node labels for the only 
> modified Node Label Mappings in the request
> 
>
> Key: YARN-4215
> URL: https://issues.apache.org/jira/browse/YARN-4215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: nodelabel, resourcemanager
> Attachments: YARN-4215.v1.001.patch
>
>
> Modified node Labels needs to be updated by the capacity scheduler holding a 
> lock hence its better to push events to scheduler only when there is actually 
> a change in the label mapping for a given node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-09-30 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939309#comment-14939309
 ] 

Vrushali C commented on YARN-3864:
--

cc [~sjlee0]

bq. This issue becomes a major blocker for the web UI POC since we cannot list 
applications within one flowrun. 

[~gtCarrera] Perhaps it will be a good idea if we all can know the scope of the 
UI PoC.  When jiras suddenly become blockers, it is hard to plan vs prioritize 
work.  Also, are the main landing page and flow details page of the UI PoC 
done? Would love to know more about the UI PoC current status and planned work. 
If there are any jiras that already contain this info, please point me to those!

thanks

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3864) Implement support for querying single app and all apps for a flow run

2015-09-30 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939312#comment-14939312
 ] 

Li Lu commented on YARN-3864:
-

Sure, I'll send some recent update about the UI work soon. Sorry for suddenly 
raising this into a blocker. If there are any bandwidth problems I can take 
care of this work. 

> Implement support for querying single app and all apps for a flow run
> -
>
> Key: YARN-3864
> URL: https://issues.apache.org/jira/browse/YARN-3864
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
>
> This JIRA will handle support for querying all apps for a flow run in HBase 
> reader implementation.
> And also REST API implementation for single app and multiple apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition

2015-09-30 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4140:
---
Attachment: 0013-YARN-4140.patch

Hi [~leftnoteasy]

Thanks for the review and comments.
Have updated patched based on comments.

# Comment 1: Got confused with resource request asks.Only the last Any request 
label need to be stored for decreasing usages in
{code}
 if (updatePendingResources) {
   ...
}
{code}
# Comment 2: Udpated the variables name as per comments.
# Comment 3: Only when change happens wanted to change the request thats the 
reason added the same.
# Comment 4: Faced Fair and Fifo related testcase failures when request is 
label was null.Currently the same is not happening.



> RM container allocation delayed incase of app submitted to Nodelabel partition
> --
>
> Key: YARN-4140
> URL: https://issues.apache.org/jira/browse/YARN-4140
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4140.patch, 0002-YARN-4140.patch, 
> 0003-YARN-4140.patch, 0004-YARN-4140.patch, 0005-YARN-4140.patch, 
> 0006-YARN-4140.patch, 0007-YARN-4140.patch, 0008-YARN-4140.patch, 
> 0009-YARN-4140.patch, 0010-YARN-4140.patch, 0011-YARN-4140.patch, 
> 0012-YARN-4140.patch, 0013-YARN-4140.patch
>
>
> Trying to run application on Nodelabel partition I  found that the 
> application execution time is delayed by 5 – 10 min for 500 containers . 
> Total 3 machines 2 machines were in same partition and app submitted to same.
> After enabling debug was able to find the below
> # From AM the container ask is for OFF-SWITCH
> # RM allocating all containers to NODE_LOCAL as shown in logs below.
> # So since I was having about 500 containers time taken was about – 6 minutes 
> to allocate 1st map after AM allocation.
> # Tested with about 1K maps using PI job took 17 minutes to allocate  next 
> container after AM allocation
> Once 500 container allocation on NODE_LOCAL is done the next container 
> allocation is done on OFF_SWITCH
> {code}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> /default-rack, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: *, Relax 
> Locality: true, Node Label Expression: 3}
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-143, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
>  showRequests: application=application_1441791998224_0001 request={Priority: 
> 20, Capability: , # Containers: 500, Location: 
> host-10-19-92-117, Relax Locality: true, Node Label Expression: }
> 2015-09-09 15:21:58,954 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> {code}
>  
> {code}
> 2015-09-09 14:35:45,467 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:45,831 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 
> absoluteUsedCapacity=0.0, numApps=1, numContainers=1 -->  vCores:0>, NODE_LOCAL
> 2015-09-09 14:35:46,469 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
> Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, 
> usedResources=, usedCapacity=0.0, 

[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-09-30 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937959#comment-14937959
 ] 

zhihai xu commented on YARN-4209:
-

[~rohithsharma], thanks for the review! I uploaded a new patch 
YARN-4209.002.patch, which addressed all your comments. Please review it.

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch, YARN-4209.001.patch, 
> YARN-4209.002.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-09-30 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3996:

Assignee: Neelesh Srinivas Salian  (was: Anubhav Dhoot)

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException

2015-09-30 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3619:

Attachment: YARN-3619.001.patch

> ContainerMetrics unregisters during getMetrics and leads to 
> ConcurrentModificationException
> ---
>
> Key: YARN-3619
> URL: https://issues.apache.org/jira/browse/YARN-3619
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: zhihai xu
> Attachments: YARN-3619.000.patch, YARN-3619.001.patch, test.patch
>
>
> ContainerMetrics is able to unregister itself during the getMetrics method, 
> but that method can be called by MetricsSystemImpl.sampleMetrics which is 
> trying to iterate the sources.  This leads to a 
> ConcurrentModificationException log like this:
> {noformat}
> 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN 
> impl.MetricsSystemImpl: java.util.ConcurrentModificationException
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936922#comment-14936922
 ] 

Jason Lowe commented on YARN-3727:
--

+1 lgtm.  Committing this.

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937472#comment-14937472
 ] 

Hudson commented on YARN-3727:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2407 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2407/])
YARN-3727. For better error recovery, check if the directory exists (jlowe: rev 
854d25b0c30fd40f640c052e79a8747741492042)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java


> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937625#comment-14937625
 ] 

Hudson commented on YARN-3727:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1203 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1203/])
YARN-3727. For better error recovery, check if the directory exists (jlowe: rev 
854d25b0c30fd40f640c052e79a8747741492042)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java


> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936978#comment-14936978
 ] 

Hudson commented on YARN-3727:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8547 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8547/])
YARN-3727. For better error recovery, check if the directory exists (jlowe: rev 
854d25b0c30fd40f640c052e79a8747741492042)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java


> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937179#comment-14937179
 ] 

Hudson commented on YARN-3727:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2379 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2379/])
YARN-3727. For better error recovery, check if the directory exists (jlowe: rev 
854d25b0c30fd40f640c052e79a8747741492042)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* hadoop-yarn-project/CHANGES.txt


> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937444#comment-14937444
 ] 

Jason Lowe commented on YARN-3727:
--

I'll commit to 2.6.

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-3727:
--
Target Version/s: 2.6.2
  Labels:   (was: 2.6.2-candidate)

Adding 2.6.2 as the target version. [~zxu] or [~jlowe], does this apply cleanly 
to branch-2.6? Please feel free to cherry-pick the commit to branch-2.6.

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3727:
-
Fix Version/s: 2.6.2

Committed to branch-2.6 as well.  Verified TestLocalResourcesTrackerImpl and 
TestResourceLocalizationService passed.


> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException

2015-09-30 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937022#comment-14937022
 ] 

Jason Lowe commented on YARN-3619:
--

My apologies for the long delay, as this fell off my radar.  The approach seems 
reasonable.

The patch needs to be upmerged to trunk.  In addition I'm wondering about the 
Timer handling.  I think the Timer should be a daemon thread (we don't want to 
prolong NM shutdown due to this).  Also it seems wasteful to dedicate a 
separate timer thread for every container that finished.  It would be more 
efficient to share a timer that handles multiple timer tasks rather than spawn 
a thread for every timer task.


> ContainerMetrics unregisters during getMetrics and leads to 
> ConcurrentModificationException
> ---
>
> Key: YARN-3619
> URL: https://issues.apache.org/jira/browse/YARN-3619
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: zhihai xu
> Attachments: YARN-3619.000.patch, test.patch
>
>
> ContainerMetrics is able to unregister itself during the getMetrics method, 
> but that method can be called by MetricsSystemImpl.sampleMetrics which is 
> trying to iterate the sources.  This leads to a 
> ConcurrentModificationException log like this:
> {noformat}
> 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN 
> impl.MetricsSystemImpl: java.util.ConcurrentModificationException
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937034#comment-14937034
 ] 

Hudson commented on YARN-3727:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #472 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/472/])
YARN-3727. For better error recovery, check if the directory exists (jlowe: rev 
854d25b0c30fd40f640c052e79a8747741492042)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java


> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>  Labels: 2.6.2-candidate
> Fix For: 2.7.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException

2015-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937044#comment-14937044
 ] 

Hadoop QA commented on YARN-3619:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733882/YARN-3619.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6c17d31 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9311/console |


This message was automatically generated.

> ContainerMetrics unregisters during getMetrics and leads to 
> ConcurrentModificationException
> ---
>
> Key: YARN-3619
> URL: https://issues.apache.org/jira/browse/YARN-3619
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: zhihai xu
> Attachments: YARN-3619.000.patch, test.patch
>
>
> ContainerMetrics is able to unregister itself during the getMetrics method, 
> but that method can be called by MetricsSystemImpl.sampleMetrics which is 
> trying to iterate the sources.  This leads to a 
> ConcurrentModificationException log like this:
> {noformat}
> 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN 
> impl.MetricsSystemImpl: java.util.ConcurrentModificationException
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4212) FairScheduler: Parent queues with 'Fair' policy should compute shares of all resources for its children during a recompute

2015-09-30 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-4212:
--
Attachment: YARN-4212.1.patch

Attaching trivial patch

> FairScheduler: Parent queues with 'Fair' policy should compute shares of all 
> resources for its children during a recompute
> --
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>  Labels: fairscheduler
> Attachments: YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-3727:
-
Labels: 2.6.2-candidate  (was: )

Pinging [~sjlee0] to see if this is something desired for 2.6.2.

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>  Labels: 2.6.2-candidate
> Fix For: 2.7.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4212) FairScheduler: Parent queues with 'Fair' policy should compute shares of all resources for its children during a recompute

2015-09-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937700#comment-14937700
 ] 

Hadoop QA commented on YARN-4212:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 38s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  1s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  61m 14s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  99m  6s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764429/YARN-4212.1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 854d25b |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9310/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9310/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9310/console |


This message was automatically generated.

> FairScheduler: Parent queues with 'Fair' policy should compute shares of all 
> resources for its children during a recompute
> --
>
> Key: YARN-4212
> URL: https://issues.apache.org/jira/browse/YARN-4212
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>  Labels: fairscheduler
> Attachments: YARN-4212.1.patch
>
>
> The Fair Scheduler, while performing a {{recomputeShares()}} during an 
> {{update()}} call, uses the parent queues policy to distribute shares to its 
> children.
> If the parent queues policy is 'fair', it only computes weight for memory and 
> sets the vcores fair share of its children to 0.
> Assuming a situation where we have 1 parent queue with policy 'fair' and 
> multiple leaf queues with policy 'drf', Any app submitted to the child queues 
> with vcore requirement > 1 will always be above fairshare, since during the 
> recomputeShare process, the child queues were all assigned 0 for fairshare 
> vcores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-09-30 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937707#comment-14937707
 ] 

Sunil G commented on YARN-3216:
---

Thank you [~leftnoteasy] for sharing the comments. Yea, :) I think smileys have 
given preference.

I will address them in next patch. 
bq.getAMResourceLimitPerPartition should uses partition.totalResource
Here we would like to get the resource used by partition in a queue and then 
take the am % on that resource value. correct?

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937717#comment-14937717
 ] 

Hudson commented on YARN-3727:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #464 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/464/])
YARN-3727. For better error recovery, check if the directory exists (jlowe: rev 
854d25b0c30fd40f640c052e79a8747741492042)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalResourcesTrackerImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalResourcesTrackerImpl.java


> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3216) Max-AM-Resource-Percentage should respect node labels

2015-09-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937725#comment-14937725
 ] 

Wangda Tan commented on YARN-3216:
--

bq. Here we would like to get the resource used by partition in a queue and 
then take the am % on that resource value. correct?
Exactly, when we computing am-resouce-limit, the queue-resource should be 
partition.totalResource * queue.capacity-of-the-partititon.

> Max-AM-Resource-Percentage should respect node labels
> -
>
> Key: YARN-3216
> URL: https://issues.apache.org/jira/browse/YARN-3216
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-3216.patch, 0002-YARN-3216.patch
>
>
> Currently, max-am-resource-percentage considers default_partition only. When 
> a queue can access multiple partitions, we should be able to compute 
> max-am-resource-percentage based on that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-30 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937728#comment-14937728
 ] 

Sangjin Lee commented on YARN-3727:
---

Thanks Jason!

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.7.2, 2.6.2
>
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)