[jira] [Created] (YARN-4210) HBase reader throws NPE if GET returns no rows

2015-09-29 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-4210:
--

 Summary: HBase reader throws NPE if GET returns no rows
 Key: YARN-4210
 URL: https://issues.apache.org/jira/browse/YARN-4210
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: YARN-2928
Reporter: Varun Saxena
Assignee: Varun Saxena


{noformat}
2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:handleException(199)) - Error while processing 
REST request
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4209) RMStateStore FENCED state doesn’t work due to updateFencedState called by stateMachine.doTransition

2015-09-29 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934695#comment-14934695
 ] 

zhihai xu commented on YARN-4209:
-

Thanks for the review [~rohithsharma]! Yes, that is a good point! Using 
MultipleArcTransition will be a better solution. I will implement a new patch 
using MultipleArcTransition.

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---
>
> Key: YARN-4209
> URL: https://issues.apache.org/jira/browse/YARN-4209
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
> Attachments: YARN-4209.000.patch
>
>
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934848#comment-14934848
 ] 

Varun Saxena commented on YARN-4075:


[~gtCarrera9],
bq. A quick question: if I would like to query for a flow run with flow name 
flow_1437599248581_1 and flow run id 1, shall I call the rest api as: 
http://localhost:8188/ws/v2/timeline/flowrun/yarn_cluster/flow_1437599248581_1/1
You need to specify user as an optional query param if user in the request is 
not same as the record you are seeking. I think that is what is happening for 
you. User is part of row key in flow run table. 
Query should be something like 
http://localhost:8188/ws/v2/timeline/flowrun/yarn_cluster/flow_1437599248581_1/1?userid=some_user

NPE is because HBase reader is reading the result even though row key does not 
match.
Should have added a UT case for row not matching. All of us missed it.
Will file a separate JIRA for this bug.


> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.03.patch, 
> YARN-4075-YARN-2928.04.patch, YARN-4075-YARN-2928.05.patch, 
> YARN-4075-YARN-2928.POC.1.patch, YARN-4075-YARN-2928.POC.2.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935165#comment-14935165
 ] 

Varun Saxena commented on YARN-4210:


Moreover, there are couple of other issues I noticed while going through the 
code.
# I cannot see ResultScanner#close being called in xxxEntityReader classes. 
Depending on number of queries going on in parallel, this may become an issue 
at server side. Server will expire the scanner after a configured lease 
interval if close is not called explicitly.
# Also is FlowActivityEntityReader#readEntities required ? 
TimelineEntityReader#readEntities should suffice. In 
TimelineEntityReader#readEntities though we can check when we reach the limit 
instead of going over limit and then calling pollLast. In short follow same 
approach as FlowActivityEntityReader,

cc [~sjlee0], [~vrushalic], [~jrottinghuis].

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4210) HBase reader throws NPE if GET returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4210:
---
Description: 
If HBase Get does not fetch any rows for the query, we still try to parse the 
result and read fields. This leads to NPE while reading metrics. We should not 
attempt to read anything if no row is returned i.e. result is empty 
{noformat}
2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:handleException(199)) - Error while processing 
REST request
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)

{noformat}

  was:
{noformat}
2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:handleException(199)) - Error while processing 
REST request
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)

{noformat}


> HBase reader throws NPE if GET returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> 

[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934859#comment-14934859
 ] 

Varun Saxena commented on YARN-4075:


Filed YARN-4210 for the same.

> [reader REST API] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4075
> URL: https://issues.apache.org/jira/browse/YARN-4075
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4075-YARN-2928.03.patch, 
> YARN-4075-YARN-2928.04.patch, YARN-4075-YARN-2928.05.patch, 
> YARN-4075-YARN-2928.POC.1.patch, YARN-4075-YARN-2928.POC.2.patch
>
>
> We need to be able to query for flows and flow runs via REST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4210:
---
Description: 
If HBase Get does not fetch any rows for the query, we still try to parse the 
result and read fields. This leads to NPE while reading metrics. We should not 
attempt to read anything if no row is returned i.e. result is empty.
Found during web UI poc testing. 
{noformat}
2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:handleException(199)) - Error while processing 
REST request
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)

{noformat}

  was:
If HBase Get does not fetch any rows for the query, we still try to parse the 
result and read fields. This leads to NPE while reading metrics. We should not 
attempt to read anything if no row is returned i.e. result is empty 
{noformat}
2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
reader.TimelineReaderWebServices 
(TimelineReaderWebServices.java:handleException(199)) - Error while processing 
REST request
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
at 
org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)

{noformat}


> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
>

[jira] [Updated] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4210:
---
Attachment: YARN-4210-YARN-2928.01.patch

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934981#comment-14934981
 ] 

Hadoop QA commented on YARN-4210:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 15s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  3s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 12s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 42s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 55s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 17s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  41m 45s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764215/YARN-4210-YARN-2928.01.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / def22b9 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9292/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9292/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9292/console |


This message was automatically generated.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states

2015-09-29 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934867#comment-14934867
 ] 

Sunil G commented on YARN-4141:
---

Thank you [~jlowe] for the review and commit and thank you [~rohithsharma] for 
the review.

> Runtime Application Priority change should not throw exception for 
> applications at finishing states
> ---
>
> Key: YARN-4141
> URL: https://issues.apache.org/jira/browse/YARN-4141
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-4141.patch, 0002-YARN-4141.patch, 
> 0003-YARN-4141.patch, 0004-YARN-4141.patch, 0005-YARN-4141.patch, 
> 0006-YARN-4141.patch, 0007-YARN-4141.patch
>
>
> As suggested by [~jlowe] in 
> [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035]
>  , its good that if YARN can suppress exceptions during change application 
> priority calls for applications at its finishing stages.
> Currently it will be difficult for clients to handle this. This will be 
> similar to kill application behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4211) Many InvalidStateTransitionException on NM recovering the containers

2015-09-29 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4211:
---

 Summary: Many InvalidStateTransitionException on NM recovering the 
containers
 Key: YARN-4211
 URL: https://issues.apache.org/jira/browse/YARN-4211
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
Reporter: Rohith Sharma K S


It is observed that many InvalidStateTransitionException on NM recovering the 
containers.
Scenario is 
NM restarted while containers running.
{noformat}
2015-09-29 16:40:10,606 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
INIT_CONTAINER at FINISHED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:458)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:63)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1294)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1286)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
2015-09-29 16:53:24,774 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_1443523004643_0001_02_01 transitioned from NEW to DONE
2015-09-29 16:53:24,774 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
APPLICATION_CONTAINER_FINISHED at FINISHED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:458)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:63)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1294)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1286)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4211) Many InvalidStateTransitionException on NM recovering the containers

2015-09-29 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4211:

Attachment: nodemanager.log

> Many InvalidStateTransitionException on NM recovering the containers
> 
>
> Key: YARN-4211
> URL: https://issues.apache.org/jira/browse/YARN-4211
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Rohith Sharma K S
> Attachments: nodemanager.log
>
>
> It is observed that many InvalidStateTransitionException on NM recovering the 
> containers.
> Scenario is 
> NM restarted while containers running.
> {noformat}
> 2015-09-29 16:40:10,606 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> INIT_CONTAINER at FINISHED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:458)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:63)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1294)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1286)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-29 16:53:24,774 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1443523004643_0001_02_01 transitioned from NEW to 
> DONE
> 2015-09-29 16:53:24,774 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> APPLICATION_CONTAINER_FINISHED at FINISHED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:458)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:63)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1294)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1286)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4210:
---
Summary: HBase reader throws NPE if Get returns no rows  (was: HBase reader 
throws NPE if GET returns no rows)

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934954#comment-14934954
 ] 

Varun Saxena commented on YARN-4178:


bq. As for the "application" prefix, is there a discussion in the YARN 
community in general of making this prefix customizable? I'm just not familiar 
with the current state of that discussion, although I recall hearing about it 
in the past. That would help us determine whether this is significant enough 
for us to care about it.
I haven't come across any such active discussion recently. [~vinodkv] might be 
knowing whether there is any such pre-decided future plan as well. 
Even if a prefix is added in future we would most probably encapsulate that 
inside ApplicationId class itself i.e. prefix will be a an additional field 
instead of a static string(application_). So while checking for references of 
this class, person changing or reviewing the code would come across this code 
as well and should take necessary action i.e. storing the prefix in HBase, IMO. 
Thoughts ?

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934934#comment-14934934
 ] 

Varun Saxena commented on YARN-4178:


Ok...Will change it.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935314#comment-14935314
 ] 

Hadoop QA commented on YARN-3727:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 17s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:red}-1{color} | javac |   3m 17s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735768/YARN-3727.000.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 715dbdd |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9294/console |


This message was automatically generated.

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread Chang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935272#comment-14935272
 ] 

Chang Li commented on YARN-3727:


+1 (non binding)

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4186) Make WebAppUtils a public API for yarn

2015-09-29 Thread Kuhu Shukla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated YARN-4186:
--
Attachment: YARN-4186-v1.patch

Made private class as public. Took out an unused method and other small changes.

> Make WebAppUtils a public API for yarn
> --
>
> Key: YARN-4186
> URL: https://issues.apache.org/jira/browse/YARN-4186
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: YARN-4186-v1.patch
>
>
> Application types like Tez will want to expose AM container log location to 
> users without having to make a call to the RM.
> Exposing getRunningLogURL as public will provide this functionality.
> Jon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler

2015-09-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935279#comment-14935279
 ] 

Karthik Kambatla commented on YARN-4066:


+1, checking this in. 

> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Attachments: YARN-4066-2.patch, YARN-4066-3.patch, yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935313#comment-14935313
 ] 

Varun Saxena commented on YARN-4210:


Point 2 is invalid.
I was under the impression that we will use PageFilter everywhere. 
But just checked we will be fetching all entities in GenericEntityReader. 

Makes sense too. As entity id can be anything for generic entity table and 
lexicographical ordering of row key may not always match with created time.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935347#comment-14935347
 ] 

Hudson commented on YARN-4066:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #459 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/459/])
YARN-4066. Large number of queues choke fair scheduler. (Johan Gustavsson via 
kasha) (kasha: rev a0b5a0a419dfc07b7ac45c06b11b4c8dc7e79958)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java


> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Fix For: 2.8.0
>
> Attachments: YARN-4066-2.patch, YARN-4066-3.patch, yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935341#comment-14935341
 ] 

Hudson commented on YARN-4066:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8540 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8540/])
YARN-4066. Large number of queues choke fair scheduler. (Johan Gustavsson via 
kasha) (kasha: rev a0b5a0a419dfc07b7ac45c06b11b4c8dc7e79958)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt


> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Fix For: 2.8.0
>
> Attachments: YARN-4066-2.patch, YARN-4066-3.patch, yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4203:
---
Attachment: YARN-4203-YARN-2928.02.patch

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.01.patch, 
> YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935330#comment-14935330
 ] 

Jason Lowe commented on YARN-3727:
--

I think the approach is a reasonable workaround to catch cases where we leak a 
local resource and want to avoid failing a new localization.

Could you update the patch to trunk?

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4210:
---
Attachment: YARN-4210-YARN-2928.02.patch

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935365#comment-14935365
 ] 

Varun Saxena commented on YARN-4210:


Found that we do not support getting multiple apps from application table. We 
wont need to query something like list all apps for a flow run ?

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4186) Make WebAppUtils a public API for yarn

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935381#comment-14935381
 ] 

Hadoop QA commented on YARN-4186:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 15s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 57s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 23s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 55s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m  0s | Tests passed in 
hadoop-yarn-common. |
| | |  42m 40s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764256/YARN-4186-v1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a0b5a0a |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9296/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9296/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9296/console |


This message was automatically generated.

> Make WebAppUtils a public API for yarn
> --
>
> Key: YARN-4186
> URL: https://issues.apache.org/jira/browse/YARN-4186
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
> Attachments: YARN-4186-v1.patch
>
>
> Application types like Tez will want to expose AM container log location to 
> users without having to make a call to the RM.
> Exposing getRunningLogURL as public will provide this functionality.
> Jon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-09-29 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935452#comment-14935452
 ] 

Neelesh Srinivas Salian commented on YARN-4185:
---

Writing up a patch for this.
Questions I had:
1) This would be included for the NMProxy and a new RetryPolicy setting
exponentialBackoffRetry(5,1000, TimeUnit.MILLISECONDS
What would be the value of the maxRetries?

I see the value set to 5 for NameNodeProxies. Is there an arbitrarily set value 
or does it need to be taken from the conf?
Thank you.

> Retry interval delay for NM client can be improved from the fixed static 
> retry 
> ---
>
> Key: YARN-4185
> URL: https://issues.apache.org/jira/browse/YARN-4185
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> Instead of having a fixed retry interval that starts off very high and stays 
> there, we are better off using an exponential backoff that has the same fixed 
> max limit. Today the retry interval is fixed at 10 sec that can be 
> unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935834#comment-14935834
 ] 

Hudson commented on YARN-4066:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2375 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2375/])
YARN-4066. Large number of queues choke fair scheduler. (Johan Gustavsson via 
kasha) (kasha: rev a0b5a0a419dfc07b7ac45c06b11b4c8dc7e79958)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java


> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Fix For: 2.8.0
>
> Attachments: YARN-4066-2.patch, YARN-4066-3.patch, yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936045#comment-14936045
 ] 

Li Lu commented on YARN-4210:
-

bq. Can you print whats coming as user in the request i.e. in 
TimelineReaderWebServices ? I suspect this might be a user different from what 
is present in the table.

Verified and this part LGTM. It would be helpful to report the user in the not 
found exception though? 

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3727:

Attachment: YARN-3727.001.patch

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3727:

Attachment: (was: YARN-3727.001.patch)

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-09-29 Thread Steve Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936138#comment-14936138
 ] 

Steve Armstrong commented on YARN-3840:
---

In the mean-time, if you can't run a patched Hadoop, the following is a 
bookmarklet that will fix the UI if you:

{code}
javascript:(function(){var myRegex = 
/(<.*>application_[0-9]+_)([0-9]+)(<..>)/;var tmp;for (var i = 0; i < 
appsTableData.length; i++) {  tmp = myRegex.exec(appsTableData[i][0]);  
appsTableData[i][0] = tmp[1] + ("000" + parseInt(tmp[2])).substr(-7,7) + 
tmp[3];}appsDataTable.fnClearTable();appsDataTable.fnAddData(appsTableData);})();
{code}

> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935930#comment-14935930
 ] 

Hudson commented on YARN-4066:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #435 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/435/])
YARN-4066. Large number of queues choke fair scheduler. (Johan Gustavsson via 
kasha) (kasha: rev a0b5a0a419dfc07b7ac45c06b11b4c8dc7e79958)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt


> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Fix For: 2.8.0
>
> Attachments: YARN-4066-2.patch, YARN-4066-3.patch, yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935976#comment-14935976
 ] 

Li Lu commented on YARN-4210:
-

Applied the patch and I can see the NPE problem is gone. Digging a little bit 
more I thing two quick changes would make this a more comprehensive fix the the 
flowrun inquiry:
- I'm not sure why I always need to add the {{?userid=}} part to query for the 
flows launched by myself? Are we missing anything there in parseUser? Why can't 
we directly use callerUGI if there's no userid parameter?
- From my end to end test I can see there were exceptions converting Integer to 
Long in FlowRunEntityReader#parseEntity. Specifically, I changed the code to 
firstly converting startTime and endTime to Number then get their long values 
to get rid of the problem. Not sure why we're getting integers here. 

cc/[~vrushalic][~sjlee0]. 

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935814#comment-14935814
 ] 

Wangda Tan commented on YARN-3964:
--

bq. What's your thought about the impact of the response time of 
nodeLabelsMappingProvider.getNodeLabels?
I think it should be fine to me if we solved the locking issue.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935922#comment-14935922
 ] 

Hadoop QA commented on YARN-4178:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 48s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 53s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 59s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 51s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m 49s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  40m 22s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764315/YARN-4178-YARN-2928.02.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / def22b9 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9300/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9300/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9300/console |


This message was automatically generated.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936026#comment-14936026
 ] 

Varun Saxena commented on YARN-4210:


[~gtCarrera9],
bq. From my end to end test I can see there were exceptions converting Integer 
to Long in FlowRunEntityReader#parseEntity. Specifically, I changed the code to 
firstly converting startTime and endTime to Number then get their long values 
to get rid of the problem. Not sure why we're getting integers here.
This has to do with GenericObjectMapper. In readResults we use it. It will 
automatically interpret a numeric value as an Integer if its less than max 
value of int i.e. .2,147,483,647.
Current seconds since epoch will be a value lower than max value of int. Hence, 
the exception.
As these fields are taken as info, we can merely typecast them into a Number 
and take a long value to resolve the problem. Just as you did.
Will fix it as part of this JIRA.

bq. I'm not sure why I always need to add the ?userid= part to query for the 
flows launched by myself? 
You do not need to if user in request is same as the one in the records present 
in flow run table.
Can you print whats coming as user in the request i.e. in 
TimelineReaderWebServices ? I suspect this might be a user different from what 
is present in the table.
I had tried querying without userid. And user is taken from request's caller 
UGI. hope I am not missing something here. 

bq. Also, w.r.t application queries, let's prioritize it since with the current 
set of REST APIs, we cannot query anything related to applications unless we 
know its id. We need some ways to list all apps in a flow run to finish a full 
story. Hopefully this can happen in parallel with this JIRA to speed the whole 
thing up.
Yes, already started this. You can watch YARN-3864 for updates. Need to update 
the JIRA title though.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4210:
---
Attachment: YARN-4210-YARN-2928.03.patch

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935870#comment-14935870
 ] 

Sangjin Lee commented on YARN-4210:
---

No worries. We can modify the description of YARN-3864 and do it there. Let me 
know if you need my help.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1897) CLI and core support for signal container functionality

2015-09-29 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936078#comment-14936078
 ] 

Xuan Gong commented on YARN-1897:
-

[~mingma] Could you check whether the testcase failures are related ?

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897-7.patch, YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-09-29 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4178:
---
Attachment: YARN-4178-YARN-2928.02.patch

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch, 
> YARN-4178-YARN-2928.02.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935989#comment-14935989
 ] 

Li Lu commented on YARN-4210:
-

Also, w.r.t application queries, let's prioritize it since with the current set 
of REST APIs, we cannot query anything related to applications unless we know 
its id. We need some ways to list all apps in a flow run to finish a full 
story. Hopefully this can happen in parallel with this JIRA to speed the whole 
thing up. 

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1897) CLI and core support for signal container functionality

2015-09-29 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936077#comment-14936077
 ] 

Xuan Gong commented on YARN-1897:
-

bq. The patch already prints all signals supported if you don't specify any 
parameter. Do you want an explicit option called "-all"?

Okay, i think that it is good enough right now.

bq. That also brings up the issue where RM and NM continue to use 
NodeHeartbeatResponse's ContainersToCleanup to kill containers due to 
preemption. Should we migrate it to ContainersToSignalList? But that could be a 
separate jira.

Yes, this is the issue. Let us do it separately. Let us focus on the public API 
here.

bq. Regarding the diagnosis, do you want to allow the end user to specify the 
reason from CLI/YarnClient? If it is generated only by YARN components, we can 
also use enum similar to CMgrCompletedContainersEvent's reason.

For example, if we go to RM web ui/ATS ui, we check the status of all 
containers, it is better to show more details,such as "kill by RM because of 
Preemption", "kill by the user for testing", etc, in instead of just simply 
showing "KILL BY RESOURCEMANAGER". Probably, this can let users better 
understand their application. But, right now, I think that it is fine to skip 
this. We can do separately if needed.

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897-7.patch, YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1897) CLI and core support for signal container functionality

2015-09-29 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936110#comment-14936110
 ] 

Xuan Gong commented on YARN-1897:
-

[~mingma] Also could you rebase the patch, please ? 

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897-7.patch, YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936113#comment-14936113
 ] 

Hadoop QA commented on YARN-4210:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 23s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m 20s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  0s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 30s | The applied patch generated  1 
new checkstyle issues (total was 33, now 34). |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 52s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   2m 54s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  45m 44s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764335/YARN-4210-YARN-2928.03.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / def22b9 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/9301/artifact/patchprocess/diffcheckstylehadoop-yarn-server-timelineservice.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9301/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9301/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9301/console |


This message was automatically generated.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936200#comment-14936200
 ] 

Hadoop QA commented on YARN-3840:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12743548/YARN-3840-6.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 39285e6 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9303/console |


This message was automatically generated.

> Resource Manager web ui issue when sorting application by id (with 
> application having id > )
> 
>
> Key: YARN-3840
> URL: https://issues.apache.org/jira/browse/YARN-3840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: LINTE
>Assignee: Mohammad Shahid Khan
> Attachments: RMApps.png, YARN-3840-1.patch, YARN-3840-2.patch, 
> YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, YARN-3840-6.patch
>
>
> On the WEBUI, the global main view page : 
> http://resourcemanager:8088/cluster/apps doesn't display applications over 
> .
> With command line it works (# yarn application -list).
> Regards,
> Alexandre



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936251#comment-14936251
 ] 

Sangjin Lee commented on YARN-4210:
---

Regarding the flow run start/end time, yes, it is because 
{{GenericObjectMapper}} converts numbers within the integer range into an 
{{Integer}}. What Varun did in the v.3 patch is the right thing to do.

Apparently we missed it through our unit test because we happened to use 
artificial start/end time that were larger than the integer max.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936336#comment-14936336
 ] 

Hadoop QA commented on YARN-3727:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  4s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 14s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 40s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 14s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   8m 26s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  47m 51s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764366/YARN-3727.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 071733d |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9304/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9304/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9304/console |


This message was automatically generated.

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936258#comment-14936258
 ] 

Sangjin Lee commented on YARN-4210:
---

I took a quick look at the patch, and the changes LGTM for the most part, with 
the understanding that the query that returns all apps for a given flow run 
will be done in YARN-3864.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936328#comment-14936328
 ] 

Li Lu commented on YARN-4210:
-

Yes, LGTM overall. My only reservation is about the logging message. 

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch, YARN-4210-YARN-2928.03.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936222#comment-14936222
 ] 

Hadoop QA commented on YARN-3727:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 59s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m 18s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 20s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   2m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 18s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   8m 29s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  48m 35s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764347/YARN-3727.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 6f335e4 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9302/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9302/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9302/console |


This message was automatically generated.

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3727:

Attachment: YARN-3727.001.patch

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3727:

Attachment: (was: YARN-3727.001.patch)

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1897) CLI and core support for signal container functionality

2015-09-29 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-1897:
--
Attachment: YARN-1897-8.patch

Thanks [~xgong]. Here is the rebase. The failed unit tests aren't related.

> CLI and core support for signal container functionality
> ---
>
> Key: YARN-1897
> URL: https://issues.apache.org/jira/browse/YARN-1897
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: YARN-1897-2.patch, YARN-1897-3.patch, YARN-1897-4.patch, 
> YARN-1897-5.patch, YARN-1897-6.patch, YARN-1897-7.patch, YARN-1897-8.patch, 
> YARN-1897.1.patch
>
>
> We need to define SignalContainerRequest and SignalContainerResponse first as 
> they are needed by other sub tasks. SignalContainerRequest should use 
> OS-independent commands and provide a way to application to specify "reason" 
> for diagnosis. SignalContainerResponse might be empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-09-29 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936319#comment-14936319
 ] 

Neelesh Srinivas Salian commented on YARN-3996:
---

The idea here is:
Scenario:
1) The ResourceRequest can be requesting for a value in memory that is in 
between RM_SCHEDULER_MINIMUM_ALLOCATION_MB and 
RM_SCHEDULER_INCREMENT_ALLOCATION_MB 
Where, let's say, the minimum is set to zero while the increment is 512MB and 
the request is 256MB
In such an event, the normalizeRequest() will normalize the request to the 
minimum as opposed to the increment which will be 512MB and fulfilling the 
request.

a. I think we may have to change the 
Resource normalize(
  ResourceCalculator calculator, Resource lhs, Resource min,
  Resource max, Resource increment)
with a check for Zero requests

But that would more of a core change that I am not too sure to do if it breaks 
anything else.

b. The other would be to check the zero requests and add a check in the Fair 
and Capacity scheduler code prior to calling  SchedulerUtils.normalizeRequests()

Thoughts?

Thank you.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-29 Thread Dian Fu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated YARN-3964:
--
Attachment: YARN-3964.013.patch

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4211) Many InvalidStateTransitionException on NM recovering the containers

2015-09-29 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936240#comment-14936240
 ] 

Rohith Sharma K S commented on YARN-4211:
-

This is able to reproduce most of the times. Observation is, this issue 
occurring only when NodeManager is killed using *kill *. If NM is 
killed using *kill -9 * , everything work fine as of now. 

> Many InvalidStateTransitionException on NM recovering the containers
> 
>
> Key: YARN-4211
> URL: https://issues.apache.org/jira/browse/YARN-4211
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Rohith Sharma K S
> Attachments: nodemanager.log
>
>
> It is observed that many InvalidStateTransitionException on NM recovering the 
> containers.
> Scenario is 
> NM restarted while containers running.
> {noformat}
> 2015-09-29 16:40:10,606 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> INIT_CONTAINER at FINISHED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:458)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:63)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1294)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1286)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-09-29 16:53:24,774 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1443523004643_0001_02_01 transitioned from NEW to 
> DONE
> 2015-09-29 16:53:24,774 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> APPLICATION_CONTAINER_FINISHED at FINISHED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:458)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:63)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1294)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1286)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side

2015-09-29 Thread Dian Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936344#comment-14936344
 ] 

Dian Fu commented on YARN-3964:
---

Hi [~leftnoteasy], [~Naganarasimha] and [~sunilg],
Thanks for your inputs. Updated the patch to eliminate the impact of the 
response time of {{nodeLabelsMappingProvider.getNodeLabels}}.

> Support NodeLabelsProvider at Resource Manager side
> ---
>
> Key: YARN-3964
> URL: https://issues.apache.org/jira/browse/YARN-3964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Dian Fu
>Assignee: Dian Fu
> Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, 
> YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, 
> YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, 
> YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, 
> YARN-3964.011.patch, YARN-3964.012.patch, YARN-3964.013.patch, 
> YARN-3964.1.patch
>
>
> Currently, CLI/REST API is provided in Resource Manager to allow users to 
> specify labels for nodes. For labels which may change over time, users will 
> have to start a cron job to update the labels. This has the following 
> limitations:
> - The cron job needs to be run in the YARN admin user.
> - This makes it a little complicate to maintain as users will have to make 
> sure this service/daemon is alive.
> Adding a Node Labels Provider in Resource Manager will provide user more 
> flexibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS

2015-09-29 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935549#comment-14935549
 ] 

Li Lu commented on YARN-3942:
-

Hi folks, I'm trying to figure out out next plan on this JIRA. Are we planning 
to make this fix and commit it to trunk soon? I'm asking this because I'm 
planning to start the next phase of this fix, which is to reduce the cache 
granularity to reduce refresh latency. If we're putting this fix back soon or 
the current patches are close, I can start on top of the existing patches. 
Otherwise maybe we'd like to improve the existing patches? Thanks for the info! 

> Timeline store to read events from HDFS
> ---
>
> Key: YARN-3942
> URL: https://issues.apache.org/jira/browse/YARN-3942
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: timelineserver
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-3942-leveldb.001.patch, 
> YARN-3942-leveldb.002.patch, YARN-3942.001.patch
>
>
> This adds a new timeline store plugin that is intended as a stop-gap measure 
> to mitigate some of the issues we've seen with ATS v1 while waiting for ATS 
> v2.  The intent of this plugin is to provide a workable solution for running 
> the Tez UI against the timeline server on a large-scale clusters running many 
> thousands of jobs per day.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-29 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935562#comment-14935562
 ] 

Vrushali C commented on YARN-4203:
--

Looks good, thanks [~varun_saxena]! Will commit this in today unless anyone 
objects. 

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.01.patch, 
> YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935489#comment-14935489
 ] 

Hudson commented on YARN-4066:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #467 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/467/])
YARN-4066. Large number of queues choke fair scheduler. (Johan Gustavsson via 
kasha) (kasha: rev a0b5a0a419dfc07b7ac45c06b11b4c8dc7e79958)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java


> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Fix For: 2.8.0
>
> Attachments: YARN-4066-2.patch, YARN-4066-3.patch, yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935530#comment-14935530
 ] 

Varun Saxena commented on YARN-4178:


That's a good point.
Yes, would need to handle it. 
As you said, should be doable i.e. records with app id stored as 12 bytes can 
be considered old records and > 12, new ones.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935531#comment-14935531
 ] 

Varun Saxena commented on YARN-4178:


That's a good point.
Yes, would need to handle it. 
As you said, should be doable i.e. records with app id stored as 12 bytes can 
be considered old records and > 12, new ones.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935534#comment-14935534
 ] 

Varun Saxena commented on YARN-4203:


Updated patch as per [~vrushalic]'s suggestion

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.01.patch, 
> YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1510) Make NMClient support change container resources

2015-09-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935584#comment-14935584
 ] 

Wangda Tan commented on YARN-1510:
--

Patch general LGTM, only one suggestion is, we can make CallbackHandler to an 
abstract class, and implements newly added methods, with this, existing 
projects using NMClient/AMRMClient will not be affected.

> Make NMClient support change container resources
> 
>
> Key: YARN-1510
> URL: https://issues.apache.org/jira/browse/YARN-1510
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1510-YARN-1197.1.patch, 
> YARN-1510-YARN-1197.2.patch, YARN-1510.3.patch
>
>
> As described in YARN-1197, YARN-1449, we need add API in NMClient to support
> 1) sending request of increase/decrease container resource limits
> 2) get succeeded/failed changed containers response from NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935496#comment-14935496
 ] 

Hudson commented on YARN-4066:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1198 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1198/])
YARN-4066. Large number of queues choke fair scheduler. (Johan Gustavsson via 
kasha) (kasha: rev a0b5a0a419dfc07b7ac45c06b11b4c8dc7e79958)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java


> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Fix For: 2.8.0
>
> Attachments: YARN-4066-2.patch, YARN-4066-3.patch, yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935500#comment-14935500
 ] 

Hadoop QA commented on YARN-4210:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 14s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m 11s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 17s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 42s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 56s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 22s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  42m 10s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764263/YARN-4210-YARN-2928.02.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / def22b9 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9297/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9297/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9297/console |


This message was automatically generated.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4203) Add request/response logging & timing for each REST endpoint call

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935558#comment-14935558
 ] 

Hadoop QA commented on YARN-4203:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  17m 12s | Findbugs (version ) appears to 
be broken on YARN-2928. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 48s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 19s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 52s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 46s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m  9s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   3m 21s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  44m 43s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764275/YARN-4203-YARN-2928.02.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / def22b9 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9298/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9298/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9298/console |


This message was automatically generated.

> Add request/response logging & timing for each REST endpoint call
> -
>
> Key: YARN-4203
> URL: https://issues.apache.org/jira/browse/YARN-4203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Vrushali C
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-4203-YARN-2928.01.patch, 
> YARN-4203-YARN-2928.02.patch
>
>
> The rest endpoints are being added as part of YARN-4075. Filing this jira to 
> add in request & response logging and timing for each REST call that comes 
> in. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4178) [storage implementation] app id as string in row keys can cause incorrect ordering

2015-09-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935493#comment-14935493
 ] 

Sangjin Lee commented on YARN-4178:
---

I think the only (minor) concern is if we did not store the prefix 
("application") today, and later YARN introduces the custom prefix. Then the 
timeline service would need to start storing the new prefix. Then at least we 
would need to be able to recognize and handle old records that do not store any 
prefix. I suspect that's doable, but am just clarifying the implication 
regarding the data schema.

> [storage implementation] app id as string in row keys can cause incorrect 
> ordering
> --
>
> Key: YARN-4178
> URL: https://issues.apache.org/jira/browse/YARN-4178
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-4178-YARN-2928.01.patch
>
>
> Currently the app id is used in various places as part of row keys. However, 
> currently they are treated as strings. This will cause a problem with 
> ordering when the id portion of the app id rolls over to the next digit.
> For example, "app_1234567890_1" will be considered *earlier* than 
> "app_1234567890_". We should correct this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-09-29 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935556#comment-14935556
 ] 

Tsuyoshi Ozawa commented on YARN-3798:
--

[~sjlee0] I think it's ready to merge.

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
> YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
> YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)

[jira] [Updated] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3727:

Attachment: YARN-3727.001.patch

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935634#comment-14935634
 ] 

zhihai xu commented on YARN-3727:
-

[~lichangleo], [~jlowe], thanks for the review! Yes, I uploaded a new patch 
YARN-3727.001.patch based on the latest code at trunk.

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-09-29 Thread MENG DING (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935681#comment-14935681
 ] 

MENG DING commented on YARN-1509:
-

Thanks for the review [~leftnoteasy]!

bq. I think we can simply add decreaseList to decrease and increaseList to 
increase.

In most cases, the current logic effectively adds decreaseList to decrease map, 
and increaseList to increase map. But since the allocate call 
{{allocateResponse = allocate(progressIndicator)}} is not synchronized, during 
the allocation, new increase/decrease requests may have been added to the 
increase/decrease table, which IMO should not be overwritten by the old 
requests cached in increaseList and decreaseList. This is similar to the new 
container requests logic when allocation fails. Let me know if you think 
otherwise.

bq,  if request matches, we can print some logs to show this

Will do.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1509) Make AMRMClient support send increase container request and get increased/decreased containers

2015-09-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935618#comment-14935618
 ] 

Wangda Tan commented on YARN-1509:
--

Thanks [~mding],

I think patch generally looks good, one query:
{code}
  // increase/decrease requests could have been added during the
  // allocate call. Those are the newest requests which take precedence
  // over requests cached in the increaseList and decreaseList.
  //
  // Only insert entries from the cached increaseList and decreaseList
  // that do not exist in either of current decrease and increase maps:
  // 1. If the cached increaseList contains the same container as that
  //in the new increase map, then there is nothing to do as the
  //the request in the new increase map has the latest value.
  // 2. If the cached increaseList contains the same container as that
  //in the new decrease map, then there is nothing to do either as
  //the request in the new decrease map is newer and should cancel
  //the old increase request.
  // 3. The above also apply to the decreaseList.
  for (ContainerResourceChangeRequest oldIncrease : increaseList) {
ContainerId oldContainerId = oldIncrease.getContainerId();
if (increase.get(oldContainerId) == null
&& decrease.get(oldContainerId) == null) {
  increase.put(oldContainerId, oldIncrease.getCapability());
}
  }
  for (ContainerResourceChangeRequest oldDecrease : decreaseList) {
ContainerId oldContainerId = oldDecrease.getContainerId();
if (decrease.get(oldContainerId) == null
&& increase.get(oldContainerId) == null) {
  decrease.put(oldContainerId, oldDecrease.getCapability());
}
  }
{code}

I think we can simply add decreaseList to decrease and increaseList to 
increase. If AllocateResponse == null, we assume allocation fails, and 
scheduler's increase/decrease table isn't updated. In this case, I think we 
should simply revert changes to increase/decrease table. Thoughts?

And I think we can add some debug/info message as well. For example, at 
{{removePendingChangeRequests}}, if request matches, we can print some logs to 
show this.

> Make AMRMClient support send increase container request and get 
> increased/decreased containers
> --
>
> Key: YARN-1509
> URL: https://issues.apache.org/jira/browse/YARN-1509
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan (No longer used)
>Assignee: MENG DING
> Attachments: YARN-1509.1.patch, YARN-1509.2.patch
>
>
> As described in YARN-1197, we need add API in AMRMClient to support
> 1) Add increase request
> 2) Can get successfully increased/decreased containers from RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935700#comment-14935700
 ] 

Sangjin Lee commented on YARN-4210:
---

That's right. {{FlowActivityEntityReader}} does it a little differently because 
it uses {{PageFilter}}, and can rely on the strict ordering of results.

Good find on {{ResultScanner}} not being closed! They should be closed once the 
query is done.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935704#comment-14935704
 ] 

Sangjin Lee commented on YARN-4210:
---

That's the current behavior, but I can see that it would be useful if we can 
query multiple applications for a given flow run. Would you like to take care 
of that as part of this JIRA? If not, I can take a look at it as a separate 
JIRA. Let me know what you want to do. Note that the validation also needs to 
change to support it.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935706#comment-14935706
 ] 

Sangjin Lee commented on YARN-4210:
---

Thanks for the patch [~varun_saxena]. I'll take a look at the current patch, 
and will get back to you with comments.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935724#comment-14935724
 ] 

Varun Saxena commented on YARN-4210:


[~sjlee0], will do this is as part of YARN-3864. That has been earmarked for 
app side query interface.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4066) Large number of queues choke fair scheduler

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935728#comment-14935728
 ] 

Hudson commented on YARN-4066:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2403 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2403/])
YARN-4066. Large number of queues choke fair scheduler. (Johan Gustavsson via 
kasha) (kasha: rev a0b5a0a419dfc07b7ac45c06b11b4c8dc7e79958)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt


> Large number of queues choke fair scheduler
> ---
>
> Key: YARN-4066
> URL: https://issues.apache.org/jira/browse/YARN-4066
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.1
>Reporter: Johan Gustavsson
>Assignee: Johan Gustavsson
> Fix For: 2.8.0
>
> Attachments: YARN-4066-2.patch, YARN-4066-3.patch, yarn-4066-1.patch
>
>
> Due to synchronization and all the loops performed during queue creation, 
> setting a large amount of queues (12000+) will completely choke the 
> scheduler. To deal with this some optimization to 
> "QueueManager.updateAllocationConfiguration(AllocationConfiguration 
> queueConf)" should be done to reduce the amount of unnesecary loops. The 
> attached patch have been tested to work with atleast 96000 queues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935737#comment-14935737
 ] 

Sangjin Lee commented on YARN-4210:
---

But this has nothing to do with aggregation, right? This is to support queries 
on the application table.

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3727) For better error recovery, check if the directory exists before using it for localization.

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935742#comment-14935742
 ] 

Hadoop QA commented on YARN-3727:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 59s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 19s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 19s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 31s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 14s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   8m 43s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| | |  47m  9s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12764298/YARN-3727.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 80d33b5 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/9299/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/9299/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/9299/console |


This message was automatically generated.

> For better error recovery, check if the directory exists before using it for 
> localization.
> --
>
> Key: YARN-3727
> URL: https://issues.apache.org/jira/browse/YARN-3727
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-3727.000.patch, YARN-3727.001.patch
>
>
> For better error recovery, check if the directory exists before using it for 
> localization.
> We saw the following localization failure happened due to existing cache 
> directories.
> {code}
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { hdfs:///X/libjars/1234.jar, 1431395961545, FILE, 
> null }, Rename cannot overwrite non empty destination directory 
> //8/yarn/nm/usercache//filecache/21637
> 2015-05-11 18:59:59,756 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs:///X/libjars/1234.jar(->//8/yarn/nm/usercache//filecache/21637/1234.jar)
>  transitioned from DOWNLOADING to FAILED
> {code}
> The real cause for this failure may be disk failure, LevelDB operation 
> failure for {{startResourceLocalization}}/{{finishResourceLocalization}} or 
> others.
> I wonder whether we can add error recovery code to avoid the localization 
> failure by not using the existing cache directories for localization.
> The exception happened at {{files.rename(dst_work, destDirPath, 
> Rename.OVERWRITE)}} in FSDownload#call. Based on the following code, after 
> the exception, the existing cache directory used by {{LocalizedResource}} 
> will be deleted.
> {code}
> try {
>  .
>   files.rename(dst_work, destDirPath, Rename.OVERWRITE);
> } catch (Exception e) {
>   try {
> files.delete(destDirPath, true);
>   } catch (IOException ignore) {
>   }
>   throw e;
> } finally {
> {code}
> Since the conflicting local directory will be deleted after localization 
> failure,
> I think it will be better to check if the directory exists before using it 
> for localization to avoid the localization failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4210) HBase reader throws NPE if Get returns no rows

2015-09-29 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935747#comment-14935747
 ] 

Varun Saxena commented on YARN-4210:


[~sjlee0], at the time by aggregated entities we meant querying flows, apps, 
etc. The name is a little generic. Will change the title and do part about 
querying apps there i.e. from REST to changes in HBase reader.
If you say, I can raise a separate JIRA and leave 3864 as it is. 

> HBase reader throws NPE if Get returns no rows
> --
>
> Key: YARN-4210
> URL: https://issues.apache.org/jira/browse/YARN-4210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4210-YARN-2928.01.patch, 
> YARN-4210-YARN-2928.02.patch
>
>
> If HBase Get does not fetch any rows for the query, we still try to parse the 
> result and read fields. This leads to NPE while reading metrics. We should 
> not attempt to read anything if no row is returned i.e. result is empty.
> Found during web UI poc testing. 
> {noformat}
> 2015-09-29 20:22:32,027 ERROR [95336304@qtp-1814206058-0] 
> reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:handleException(199)) - Error while 
> processing REST request
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnHelper.readResultsWithTimestamps(ColumnHelper.java:176)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.flow.FlowRunColumnPrefix.readResultsWithTimestamps(FlowRunColumnPrefix.java:182)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readMetrics(TimelineEntityReader.java:212)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.FlowRunEntityReader.parseEntity(FlowRunEntityReader.java:136)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineEntityReader.readEntity(TimelineEntityReader.java:137)
> at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl.getEntity(HBaseTimelineReaderImpl.java:72)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderManager.getEntity(TimelineReaderManager.java:93)
> at 
> org.apache.hadoop.yarn.server.timelineservice.reader.TimelineReaderWebServices.getFlowRun(TimelineReaderWebServices.java:403)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)