[jira] [Commented] (YARN-5236) FlowRunCoprocessor brings down HBase RegionServer

2016-06-12 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326608#comment-15326608
 ] 

Haibo Chen commented on YARN-5236:
--

Thanks very much for pointing that out, [~vrushalic]. I completely missed that 
jira and didn't find relevant information in the documentation.

> FlowRunCoprocessor brings down HBase RegionServer
> -
>
> Key: YARN-5236
> URL: https://issues.apache.org/jira/browse/YARN-5236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Haibo Chen
>
> The FlowRunCoprocessor, when loaded in HBase, will bring down the region 
> server with exception
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment.getRegion()
> I am running it with HBase 1.2.1 in pseudo-distributed mode to try out ATS v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5236) FlowRunCoprocessor brings down HBase RegionServer

2016-06-10 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-5236:


 Summary: FlowRunCoprocessor brings down HBase RegionServer
 Key: YARN-5236
 URL: https://issues.apache.org/jira/browse/YARN-5236
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Haibo Chen


The FlowRunCoprocessor, when loaded in HBase, will bring down the region server 
with exception

java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment.getRegion()

I am running it with HBase 1.2.1 in pseudo-distributed mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5236) FlowRunCoprocessor brings down HBase RegionServer

2016-06-10 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5236:
-
Description: 
The FlowRunCoprocessor, when loaded in HBase, will bring down the region server 
with exception

java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment.getRegion()

I am running it with HBase 1.2.1 in pseudo-distributed mode to try out ATS v2

  was:
The FlowRunCoprocessor, when loaded in HBase, will bring down the region server 
with exception

java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment.getRegion()

I am running it with HBase 1.2.1 in pseudo-distributed mode


> FlowRunCoprocessor brings down HBase RegionServer
> -
>
> Key: YARN-5236
> URL: https://issues.apache.org/jira/browse/YARN-5236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Haibo Chen
>
> The FlowRunCoprocessor, when loaded in HBase, will bring down the region 
> server with exception
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment.getRegion()
> I am running it with HBase 1.2.1 in pseudo-distributed mode to try out ATS v2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-23 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159168#comment-15159168
 ] 

Haibo Chen commented on YARN-4701:
--

The asf warning is unrelated to the patch

> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch, yarn4701.002.patch, 
> yarn4701.003.patch
>
>
> Accessing logs for an task attempt in the workflow tool in Hue shows "Logs 
> not available for attempt_1433822010707_0001_m_00_0. Aggregation may not 
> be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041" 
> If the user follows that link, he/she will get "It looks like you are making 
> an HTTP request to a Hadoop IPC port. This is not the correct port for the 
> web interface on this daemon." 
> We should update the message to use the correct HTTP port. We could also make 
> it more convenient by providing the application's specific page at NM as well 
> instead of just NM's main page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4732) *ProcessTree classes have too many whitespace issues

2016-02-24 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4732:
-
Assignee: Haibo Chen

> *ProcessTree classes have too many whitespace issues
> 
>
> Key: YARN-4732
> URL: https://issues.apache.org/jira/browse/YARN-4732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: Haibo Chen
>Priority: Trivial
>  Labels: newbie
>
> *ProcessTree classes have too many whitespace issues - extra newlines between 
> methods, spaces in empty lines etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-23 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4697:
-
Attachment: yarn4697.004.patch

New unit tests added for invalid values. Other comments addressed as well

> NM aggregation thread pool is not bound by limits
> -
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4697.001.patch, yarn4697.002.patch, 
> yarn4697.003.patch, yarn4697.004.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from 
> the nodemanager to HDFS if log aggregation is turned on. This is a cached 
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause 
> a problem on restart. The number of threads created at that point could be 
> huge and will put a large load on the NameNode and in worse case could even 
> bring it down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-23 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4701:
-
Attachment: yarn4701.004.patch

A unit test added to verify the correct http port is displayed when log 
aggregation is not available.

> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch, yarn4701.002.patch, 
> yarn4701.003.patch, yarn4701.004.patch
>
>
> Accessing logs for an task attempt in the workflow tool in Hue shows "Logs 
> not available for attempt_1433822010707_0001_m_00_0. Aggregation may not 
> be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041" 
> If the user follows that link, he/she will get "It looks like you are making 
> an HTTP request to a Hadoop IPC port. This is not the correct port for the 
> web interface on this daemon." 
> We should update the message to use the correct HTTP port. We could also make 
> it more convenient by providing the application's specific page at NM as well 
> instead of just NM's main page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-22 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4701:
-
Attachment: (was: yarn4701.003.patch)

> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch, yarn4701.002.patch
>
>
> Accessing logs for an task attempt in the workflow tool in Hue shows "Logs 
> not available for attempt_1433822010707_0001_m_00_0. Aggregation may not 
> be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041" 
> If the user follows that link, he/she will get "It looks like you are making 
> an HTTP request to a Hadoop IPC port. This is not the correct port for the 
> web interface on this daemon." 
> We should update the message to use the correct HTTP port. We could also make 
> it more convenient by providing the application's specific page at NM as well 
> instead of just NM's main page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-22 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4701:
-
Attachment: yarn4701.003.patch

Fixed the test failures and check style warnings. The rest are unrelated.

> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch, yarn4701.002.patch, 
> yarn4701.003.patch
>
>
> Accessing logs for an task attempt in the workflow tool in Hue shows "Logs 
> not available for attempt_1433822010707_0001_m_00_0. Aggregation may not 
> be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041" 
> If the user follows that link, he/she will get "It looks like you are making 
> an HTTP request to a Hadoop IPC port. This is not the correct port for the 
> web interface on this daemon." 
> We should update the message to use the correct HTTP port. We could also make 
> it more convenient by providing the application's specific page at NM as well 
> instead of just NM's main page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Updated] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-22 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4701:
-
Attachment: yarn4701.003.patch

> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch, yarn4701.002.patch, 
> yarn4701.003.patch
>
>
> Accessing logs for an task attempt in the workflow tool in Hue shows "Logs 
> not available for attempt_1433822010707_0001_m_00_0. Aggregation may not 
> be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041" 
> If the user follows that link, he/she will get "It looks like you are making 
> an HTTP request to a Hadoop IPC port. This is not the correct port for the 
> web interface on this daemon." 
> We should update the message to use the correct HTTP port. We could also make 
> it more convenient by providing the application's specific page at NM as well 
> instead of just NM's main page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-16 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-4697:


 Summary: NM aggregation thread pool is not bound by limits
 Key: YARN-4697
 URL: https://issues.apache.org/jira/browse/YARN-4697
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen


In the LogAggregationService.java we create a threadpool to upload logs from 
the nodemanager to HDFS if log aggregation is turned on. This is a cached 
threadpool which based on the javadoc is an ulimited pool of threads.
In the case that we have had a problem with log aggregation this could cause a 
problem on restart. The number of threads created at that point could be huge 
and will put a large load on the NameNode and in worse case could even bring it 
down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-16 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4697:
-
Attachment: yarn4697.001.patch

> NM aggregation thread pool is not bound by limits
> -
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4697.001.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from 
> the nodemanager to HDFS if log aggregation is turned on. This is a cached 
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause 
> a problem on restart. The number of threads created at that point could be 
> huge and will put a large load on the NameNode and in worse case could even 
> bring it down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-16 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149568#comment-15149568
 ] 

Haibo Chen commented on YARN-4697:
--

Default value is 100 as defined in YarnConfiguration

> NM aggregation thread pool is not bound by limits
> -
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4697.001.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from 
> the nodemanager to HDFS if log aggregation is turned on. This is a cached 
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause 
> a problem on restart. The number of threads created at that point could be 
> huge and will put a large load on the NameNode and in worse case could even 
> bring it down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-17 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150864#comment-15150864
 ] 

Haibo Chen commented on YARN-4697:
--

Hi Naganarasimha G R, 

Thanks very much for your comments. I have addressed the threadPool 
accessibility issue and also modified yarn-default.xml to match 
YarnConfiguration. To answer your other comments:

1. Yes, 50 should  be safe. (The default I set is 100). But maybe sometimes 
even 50 threads alone for log aggregation is too much resource dedicated? Some 
users may also want to use more than 50 if they have powerful machines and many 
yarn applications? If this is configurable, users themselves can decide.

2. The purpose of the semaphore is to block the threads in the thread pool 
because the main thread always acquire the semaphore first. Because I set the 
thread pool size to be 1, once that single thread tries to acquire the 
semaphore when it executes either of the two runnable, it blocks and the other 
runnable will not be executed if the thread pool can indeed create only 1 
thread. (If another thread is available in the thread pool, there will be 
another thread blocking on the semaphore, failing the test). The immediate 
release after acquire in runnable is just to safely release the resource. I'll 
try to add comments in the test code.


> NM aggregation thread pool is not bound by limits
> -
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4697.001.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from 
> the nodemanager to HDFS if log aggregation is turned on. This is a cached 
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause 
> a problem on restart. The number of threads created at that point could be 
> huge and will put a large load on the NameNode and in worse case could even 
> bring it down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-17 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4697:
-
Attachment: yarn4697.002.patch

> NM aggregation thread pool is not bound by limits
> -
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4697.001.patch, yarn4697.002.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from 
> the nodemanager to HDFS if log aggregation is turned on. This is a cached 
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause 
> a problem on restart. The number of threads created at that point could be 
> huge and will put a large load on the NameNode and in worse case could even 
> bring it down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-17 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-4701:


 Summary: When task logs are not available, port 8041 is referenced 
instead of port 8042
 Key: YARN-4701
 URL: https://issues.apache.org/jira/browse/YARN-4701
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Haibo Chen
Assignee: Haibo Chen


Accessing logs for an Oozie task attempt in the workflow tool in Hue shows 
"Logs not available for attempt_1433822010707_0001_m_00_0. Aggregation may 
not be complete, Check back later or try the nodemanager at 
quickstart.cloudera:8041"
However the nodemanager http port is 8042, not 8041. Accessing port 8041 shows 
"It looks like you are making an HTTP request to a Hadoop IPC port. This is not 
the correct port for the web interface on this daemon."

To users of Hue this is not particularly helpful without a http link. Can we 
provide a link to the task logs in 
"http://node_manager_host_address:8042/node/application/" here as well as 
the current message to assist users in Hue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-18 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4701:
-
Attachment: yarn4701.001.patch

> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch
>
>
> Accessing logs for an Oozie task attempt in the workflow tool in Hue shows 
> "Logs not available for attempt_1433822010707_0001_m_00_0. Aggregation 
> may not be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041"
> However the nodemanager http port is 8042, not 8041. Accessing port 8041 
> shows "It looks like you are making an HTTP request to a Hadoop IPC port. 
> This is not the correct port for the web interface on this daemon."
> To users of Hue this is not particularly helpful without a http link. Can we 
> provide a link to the task logs in 
> "http://node_manager_host_address:8042/node/application/" here as well 
> as the current message to assist users in Hue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-18 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4701:
-
Description: 
Accessing logs for an task attempt in the workflow tool in Hue shows "Logs not 
available for attempt_1433822010707_0001_m_00_0. Aggregation may not be 
complete, Check back later or try the nodemanager at quickstart.cloudera:8041" 

If the user follows that link, he/she will get "It looks like you are making an 
HTTP request to a Hadoop IPC port. This is not the correct port for the web 
interface on this daemon." 

We should update the message to use the correct HTTP port. We could also make 
it more convenient by providing the application's specific page at NM as well 
instead of just NM's main page.

  was:
Accessing logs for an Oozie task attempt in the workflow tool in Hue shows 
"Logs not available for attempt_1433822010707_0001_m_00_0. Aggregation may 
not be complete, Check back later or try the nodemanager at 
quickstart.cloudera:8041"
However the nodemanager http port is 8042, not 8041. Accessing port 8041 shows 
"It looks like you are making an HTTP request to a Hadoop IPC port. This is not 
the correct port for the web interface on this daemon."

To users of Hue this is not particularly helpful without a http link. Can we 
provide a link to the task logs in 
"http://node_manager_host_address:8042/node/application/" here as well as 
the current message to assist users in Hue?


> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch
>
>
> Accessing logs for an task attempt in the workflow tool in Hue shows "Logs 
> not available for attempt_1433822010707_0001_m_00_0. Aggregation may not 
> be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041" 
> If the user follows that link, he/she will get "It looks like you are making 
> an HTTP request to a Hadoop IPC port. This is not the correct port for the 
> web interface on this daemon." 
> We should update the message to use the correct HTTP port. We could also make 
> it more convenient by providing the application's specific page at NM as well 
> instead of just NM's main page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4701) When task logs are not available, port 8041 is referenced instead of port 8042

2016-02-19 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4701:
-
Attachment: yarn4701.002.patch

Thanks for the comments. Addressed.

> When task logs are not available, port 8041 is referenced instead of port 8042
> --
>
> Key: YARN-4701
> URL: https://issues.apache.org/jira/browse/YARN-4701
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4701.001.patch, yarn4701.002.patch
>
>
> Accessing logs for an task attempt in the workflow tool in Hue shows "Logs 
> not available for attempt_1433822010707_0001_m_00_0. Aggregation may not 
> be complete, Check back later or try the nodemanager at 
> quickstart.cloudera:8041" 
> If the user follows that link, he/she will get "It looks like you are making 
> an HTTP request to a Hadoop IPC port. This is not the correct port for the 
> web interface on this daemon." 
> We should update the message to use the correct HTTP port. We could also make 
> it more convenient by providing the application's specific page at NM as well 
> instead of just NM's main page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-02-19 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4697:
-
Attachment: yarn4697.003.patch

Thanks very much for you guys' comments. I have updated the patch accordingly.

> NM aggregation thread pool is not bound by limits
> -
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4697.001.patch, yarn4697.002.patch, 
> yarn4697.003.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from 
> the nodemanager to HDFS if log aggregation is turned on. This is a cached 
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause 
> a problem on restart. The number of threads created at that point could be 
> huge and will put a large load on the NameNode and in worse case could even 
> bring it down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-18 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200655#comment-15200655
 ] 

Haibo Chen commented on YARN-4766:
--

fixed the liscense + checkstyle issues. The unit test failure is unrelated to 
the patch. I have create another jira to fix the test failure at 
https://issues.apache.org/jira/browse/YARN-4838

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-14 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: (was: yarn4766.001.patch)

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-14 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: (was: yarn4766.001.patch)

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-14 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: yarn4766.001.patch

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-14 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: (was: yarn4766.001.patch)

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-14 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: yarn4766.002.patch

Addressed  checkstyle issues

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4732) *ProcessTree classes have too many whitespace issues

2016-03-08 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4732:
-
Assignee: Gabor Liptak  (was: Haibo Chen)

> *ProcessTree classes have too many whitespace issues
> 
>
> Key: YARN-4732
> URL: https://issues.apache.org/jira/browse/YARN-4732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: Gabor Liptak
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-4732.1.patch
>
>
> *ProcessTree classes have too many whitespace issues - extra newlines between 
> methods, spaces in empty lines etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4838) TestLogAggregationService. testLocalFileDeletionOnDiskFull failed

2016-03-19 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-4838:


 Summary: TestLogAggregationService. 
testLocalFileDeletionOnDiskFull failed
 Key: YARN-4838
 URL: https://issues.apache.org/jira/browse/YARN-4838
 Project: Hadoop YARN
  Issue Type: Test
  Components: log-aggregation
Reporter: Haibo Chen


org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
testLocalFileDeletionOnDiskFull failed

java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.verifyLocalFileDeletion(TestLogAggregationService.java:232)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionOnDiskFull(TestLogAggregationService.java:288)

The failure is caused by the time issue of DeletionService. DeletionService 
runs its only thread pool to delete files. When verfiyLocalFileDeletion() 
method checks file existence, it is possible that the FileDeletionTask has been 
executed by the thread pool in DeletionService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4838) TestLogAggregationService. testLocalFileDeletionOnDiskFull failed

2016-03-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned YARN-4838:


Assignee: Haibo Chen

> TestLogAggregationService. testLocalFileDeletionOnDiskFull failed
> -
>
> Key: YARN-4838
> URL: https://issues.apache.org/jira/browse/YARN-4838
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: log-aggregation
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
> testLocalFileDeletionOnDiskFull failed
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.verifyLocalFileDeletion(TestLogAggregationService.java:232)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionOnDiskFull(TestLogAggregationService.java:288)
> The failure is caused by the time issue of DeletionService. DeletionService 
> runs its only thread pool to delete files. When verfiyLocalFileDeletion() 
> method checks file existence, it is possible that the FileDeletionTask has 
> been executed by the thread pool in DeletionService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: yarn4766.003.patch

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-04 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Issue Type: Improvement  (was: Bug)

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-04 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-4766:


 Summary: NM should not aggregate logs older than the retention 
policy
 Key: YARN-4766
 URL: https://issues.apache.org/jira/browse/YARN-4766
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Reporter: Haibo Chen
Assignee: Haibo Chen


When a log aggregation fails on the NM the information is for the attempt is 
kept in the recovery DB. Log aggregation can fail for multiple reasons which 
are often related to HDFS space or permissions.

On restart the recovery DB is read and if an application attempt needs its logs 
aggregated, the files are scheduled for aggregation without any checks. The log 
files could be older than the retention limit in which case we should not 
aggregate them but immediately mark them for deletion from the local file 
system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-04-22 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: yarn4766.004.patch

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch, yarn4766.004.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-04-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254624#comment-15254624
 ] 

Haibo Chen commented on YARN-4766:
--

@Robert Kanter, thanks very much for your review. I have addressed all issues 
in the latest patch. For #6, I didn't follow exactly your comments. Instead, a 
new method that takes configs and expected files. 
testAggregatorWithRetentionPolicyDisabled_shouldUploadAllFiles and 
testAggregatorWhenNoFileOlderThanRetentionPolicy_ShouldUploadAll are still very 
much alike, but most of the code duplication is removed. 

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5001) Aggregated Logs root directory is created with wrong group if nonexistent

2016-04-26 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5001:
-
Attachment: yarn5001.001.patch

> Aggregated Logs root directory is created with wrong group if nonexistent 
> --
>
> Key: YARN-5001
> URL: https://issues.apache.org/jira/browse/YARN-5001
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn5001.001.patch
>
>
> Usually, the group owner for /tmp/logs, where the aggregated logs go, is 
> "hadoop". Under that dir, you then have 
> /logs// with group being "hadoop" 
> all the way down. 
> If you delete the /tmp/logs dir (when you want to clean up all the logs), the 
> directory will be created with a different group "superuser". The JHS runs as 
> the mapred user, who is a member of the hadoop group. With the new group, the 
> JHS doesn't have permission to read the logs any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5001) Aggregated Logs root directory is created with wrong group if nonexistent

2016-04-26 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5001:
-
Summary: Aggregated Logs root directory is created with wrong group if 
nonexistent   (was: Aggregated Logs root directory is created with wrong owner 
and group if nonexistent )

> Aggregated Logs root directory is created with wrong group if nonexistent 
> --
>
> Key: YARN-5001
> URL: https://issues.apache.org/jira/browse/YARN-5001
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> Usually, the group owner for /tmp/logs, where the aggregated logs go, is 
> "hadoop". Under that dir, you then have 
> /logs// with group being "hadoop" 
> all the way down. 
> If you delete the /tmp/logs dir (when you want to clean up all the logs), the 
> directory will be created with a different group "superuser". The JHS runs as 
> the mapred user, who is a member of the hadoop group. With the new group, the 
> JHS doesn't have permission to read the logs any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-5001) Aggregated Logs root directory is created with wrong owner and group if nonexistent

2016-04-26 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-5001:


 Summary: Aggregated Logs root directory is created with wrong 
owner and group if nonexistent 
 Key: YARN-5001
 URL: https://issues.apache.org/jira/browse/YARN-5001
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: Haibo Chen
Assignee: Haibo Chen


Usually, the owner and group for /tmp/logs, where the aggregated logs go, is:  
[root@nightly57-1 ~]# hadoop fs -ls /tmp/ | grep logs
drwxrwxrwt   - mapredhadoop  0 2016-04-14 15:46 /tmp/logs
Under that dir, you then have 
/logs//. The group should be hadoop 
all the way down, while the user should be mapred at the top, and  
starting with the  dir and below.
If you delete the /tmp/logs dir (when you want to clean up all the logs):
[root@nightly57-1 ~]# sudo -u hdfs hadoop fs -rmr /tmp/logs
And then run an MR job:
[root@nightly57-1 ~]# hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 2 2
…
The directory will be created with a different owner and group:
[root@nightly57-1 ~]# hadoop fs -ls /tmp/ | grep logs
drwxrwxrwt   - yarn  supergroup  0 2016-04-14 18:12 /tmp/logs

The owner being yarn might be okay, though this is inconsistent with the 
original owner, mapred. However, the real problem is the group now being 
supergroup instead of hadoop. The JHS runs as the mapred user, who is a member 
of the hadoop group. With the new owner and group, the JHS doesn't have 
permission to read the logs any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-5001) Aggregated Logs root directory is created with wrong owner and group if nonexistent

2016-04-26 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5001:
-
Description: 
Usually, the group owner for /tmp/logs, where the aggregated logs go, is 
"hadoop". Under that dir, you then have 
/logs// with group being "hadoop" all 
the way down. 

If you delete the /tmp/logs dir (when you want to clean up all the logs), the 
directory will be created with a different group "superuser". The JHS runs as 
the mapred user, who is a member of the hadoop group. With the new group, the 
JHS doesn't have permission to read the logs any more.

  was:
Usually, the owner and group for /tmp/logs, where the aggregated logs go, is:  
[root@nightly57-1 ~]# hadoop fs -ls /tmp/ | grep logs
drwxrwxrwt   - mapredhadoop  0 2016-04-14 15:46 /tmp/logs
Under that dir, you then have 
/logs//. The group should be hadoop 
all the way down, while the user should be mapred at the top, and  
starting with the  dir and below.
If you delete the /tmp/logs dir (when you want to clean up all the logs):
[root@nightly57-1 ~]# sudo -u hdfs hadoop fs -rmr /tmp/logs
And then run an MR job:
[root@nightly57-1 ~]# hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 2 2
…
The directory will be created with a different owner and group:
[root@nightly57-1 ~]# hadoop fs -ls /tmp/ | grep logs
drwxrwxrwt   - yarn  supergroup  0 2016-04-14 18:12 /tmp/logs

The owner being yarn might be okay, though this is inconsistent with the 
original owner, mapred. However, the real problem is the group now being 
supergroup instead of hadoop. The JHS runs as the mapred user, who is a member 
of the hadoop group. With the new owner and group, the JHS doesn't have 
permission to read the logs any more.


> Aggregated Logs root directory is created with wrong owner and group if 
> nonexistent 
> 
>
> Key: YARN-5001
> URL: https://issues.apache.org/jira/browse/YARN-5001
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> Usually, the group owner for /tmp/logs, where the aggregated logs go, is 
> "hadoop". Under that dir, you then have 
> /logs// with group being "hadoop" 
> all the way down. 
> If you delete the /tmp/logs dir (when you want to clean up all the logs), the 
> directory will be created with a different group "superuser". The JHS runs as 
> the mapred user, who is a member of the hadoop group. With the new group, the 
> JHS doesn't have permission to read the logs any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-05-19 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292217#comment-15292217
 ] 

Haibo Chen commented on YARN-4766:
--

Thanks Ray for your comment, the name you suggested is much better. Made the 
change and upload a new patch to verify the patch still works since it is been 
sitting for a while.

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch, yarn4766.004.patch, yarn4766.004.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-05-19 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: yarn4766.005.patch

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch, yarn4766.004.patch, yarn4766.004.patch, yarn4766.005.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-05-23 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296653#comment-15296653
 ] 

Haibo Chen commented on YARN-4766:
--

The checkstyle warnings are against the protobuf-generated file. I'll try to 
clean up some of the javadoc warnings.

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch, yarn4766.004.patch, yarn4766.004.patch, yarn4766.005.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-05-23 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: (was: yarn4766.004.patch)

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch, yarn4766.004.patch, yarn4766.005.patch, yarn4766.006.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-05-23 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: yarn4766.006.patch

Rebase the patch on trunk, addressed some of the checkstyle issues. Many of 
them are related to accessibility of the fields. The fields are all final and 
the class to which the fields belong is private, so IMHO, the warnings are not 
necessarily valid. The javadoc issue is on a protobuf-generated file.

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch, yarn4766.004.patch, yarn4766.004.patch, 
> yarn4766.005.patch, yarn4766.006.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5041) application master log can not be available when clicking jobhistory's am logs link

2016-05-12 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5041:
-
Attachment: yarn5041.001.patch

The patch is tested locally. IMO, the unit test is too difficult to create 
given that the render() of HsJobBlock calls a lot of methods that calls 
injector. Any suggestion of how to unit test this will be appreciated. 

> application master log can not be available when clicking jobhistory's am 
> logs link
> ---
>
> Key: YARN-5041
> URL: https://issues.apache.org/jira/browse/YARN-5041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: chenyukang
>Assignee: Haibo Chen
>Priority: Critical
> Attachments: yarn5041.001.patch
>
>
> In history server webapp, application master logs link is wrong. it shows "No 
> logs available for container container_1462419429440_0003_01_01".  It 
> direct to a wrong nodemanager http port instead of a node manager' container 
> managerment port. I think YARN-4701 brought this bug



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5041) application master log can not be available when clicking jobhistory's am logs link

2016-05-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5041:
-
Attachment: yarn5041.002.patch

Thanks for letting me know. The test failures were indeed caused by my change.  
Updated the patch to fix the failures.

> application master log can not be available when clicking jobhistory's am 
> logs link
> ---
>
> Key: YARN-5041
> URL: https://issues.apache.org/jira/browse/YARN-5041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.9.0
>Reporter: chenyukang
>Assignee: Haibo Chen
>Priority: Critical
> Attachments: yarn5041.001.patch, yarn5041.002.patch
>
>
> In history server webapp, application master logs link is wrong. it shows "No 
> logs available for container container_1462419429440_0003_01_01".  It 
> direct to a wrong nodemanager http port instead of a node manager' container 
> managerment port. I think YARN-4701 brought this bug



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5001) Aggregated Logs root directory is created with wrong group if nonexistent

2016-05-05 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15272908#comment-15272908
 ] 

Haibo Chen commented on YARN-5001:
--

The fix will explicitly set the group of /tmp/logs to the group that the user 
Node Manager is running as belongs to. This will not work if the group of the 
user JHS is running as is not the same as that of Node Manager. 

> Aggregated Logs root directory is created with wrong group if nonexistent 
> --
>
> Key: YARN-5001
> URL: https://issues.apache.org/jira/browse/YARN-5001
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn5001.001.patch
>
>
> The directory /tmp/logs, where the aggregated logs go, is supposed to be read 
> by JHS. But if it is not created beforehand, it will be created by 
> NodeManager with the group being the superuser set in HDFS.  Files created 
> under this directory will then inherit the supergroup as their groups. This 
> leads to JHS to fail to read the container logs files under that directory if 
> JHS is not running as a user that belongs to superuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5001) Aggregated Logs root directory is created with wrong group if nonexistent

2016-05-05 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5001:
-
Description: The directory /tmp/logs, where the aggregated logs go, is 
supposed to be read by JHS. But if it is not created beforehand, it will be 
created by NodeManager with the group being the superuser set in HDFS.  Files 
created under this directory will then inherit the supergroup as their groups. 
This leads to JHS to fail to read the container logs files under that directory 
if JHS is not running as a user that belongs to superuser.  (was: The directory 
/tmp/logs, where the aggregated logs go, is supposed to be read by JHS. But if 
it is not created beforehand, it will be created by NodeManager with group 
being the superuser set in HDFS. This leads to JHS to fail to read the 
container logs files under that directory if JHS is not running as a user that 
belongs to superuser.)

> Aggregated Logs root directory is created with wrong group if nonexistent 
> --
>
> Key: YARN-5001
> URL: https://issues.apache.org/jira/browse/YARN-5001
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn5001.001.patch
>
>
> The directory /tmp/logs, where the aggregated logs go, is supposed to be read 
> by JHS. But if it is not created beforehand, it will be created by 
> NodeManager with the group being the superuser set in HDFS.  Files created 
> under this directory will then inherit the supergroup as their groups. This 
> leads to JHS to fail to read the container logs files under that directory if 
> JHS is not running as a user that belongs to superuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5001) Aggregated Logs root directory is created with wrong group if nonexistent

2016-05-05 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5001:
-
Description: The directory /tmp/logs, where the aggregated logs go, is 
supposed to be read by JHS. But if it is not created beforehand, it will be 
created by NodeManager with group being the superuser set in HDFS. This leads 
to JHS to fail to read the container logs files under that directory if JHS is 
not running as a user that belongs to superuser.  (was: Usually, the group 
owner for /tmp/logs, where the aggregated logs go, is "hadoop". Under that dir, 
you then have /logs// with group 
being "hadoop" all the way down. 

If you delete the /tmp/logs dir (when you want to clean up all the logs), the 
directory will be created with a different group "superuser". The JHS runs as 
the mapred user, who is a member of the hadoop group. With the new group, the 
JHS doesn't have permission to read the logs any more.)

> Aggregated Logs root directory is created with wrong group if nonexistent 
> --
>
> Key: YARN-5001
> URL: https://issues.apache.org/jira/browse/YARN-5001
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn5001.001.patch
>
>
> The directory /tmp/logs, where the aggregated logs go, is supposed to be read 
> by JHS. But if it is not created beforehand, it will be created by 
> NodeManager with group being the superuser set in HDFS. This leads to JHS to 
> fail to read the container logs files under that directory if JHS is not 
> running as a user that belongs to superuser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-05-05 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: yarn4766.004.patch

rebased the patch on the latest trunk, fixed the compilation issue

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch, yarn4766.004.patch, yarn4766.004.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4697) NM aggregation thread pool is not bound by limits

2016-04-20 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251022#comment-15251022
 ] 

Haibo Chen commented on YARN-4697:
--

[~vinodkv] Upon NM restart, NM will try to recover all applications and submit 
a log aggregation task to the thread pool for each application recovered.  
Therefore, a large number of recovered applications plus concurrent 
applications can cause the thread pool to increase without a bound.

> NM aggregation thread pool is not bound by limits
> -
>
> Key: YARN-4697
> URL: https://issues.apache.org/jira/browse/YARN-4697
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Critical
> Fix For: 2.9.0
>
> Attachments: yarn4697.001.patch, yarn4697.002.patch, 
> yarn4697.003.patch, yarn4697.004.patch
>
>
> In the LogAggregationService.java we create a threadpool to upload logs from 
> the nodemanager to HDFS if log aggregation is turned on. This is a cached 
> threadpool which based on the javadoc is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause 
> a problem on restart. The number of threads created at that point could be 
> huge and will put a large load on the NameNode and in worse case could even 
> bring it down due to file descriptor issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-05-09 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276598#comment-15276598
 ] 

Haibo Chen commented on YARN-4766:
--

findbug issue is unrelated to my patch

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch, yarn4766.004.patch, yarn4766.004.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5373) NPE listing wildcard directory in containerLaunch

2016-07-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378287#comment-15378287
 ] 

Haibo Chen commented on YARN-5373:
--

Created a separate jira to fix this issue for Windows, as I don't have an 
access to a window OS.

> NPE listing wildcard directory in containerLaunch
> -
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Blocker
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE listing wildcard directory in containerLaunch

2016-07-14 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
for (File wildLink : directory.listFiles()) {
sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null (only happens in a secure cluster), NPE 
will cause the container fail to launch.
Hive, Oozie jobs fail as a result.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
for (File wildLink : directory.listFiles()) {
sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null (only happens in a secure cluster), NPE 
will cause the container fail to launch.


> NPE listing wildcard directory in containerLaunch
> -
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Blocker
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.
> Hive, Oozie jobs fail as a result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE listing wildcard directory in containerLaunch

2016-07-14 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Priority: Blocker  (was: Critical)

> NPE listing wildcard directory in containerLaunch
> -
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Blocker
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5381) fix NPE listing wildcard directory on Windows

2016-07-14 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-5381:


 Summary: fix NPE listing wildcard directory on Windows
 Key: YARN-5381
 URL: https://issues.apache.org/jira/browse/YARN-5381
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Haibo Chen
Priority: Blocker


NPE can be thrown when wildcard is used in libjar option and if the cluster is 
secure. The root cause is that NM can be running as a user that does not have 
access to resource files that are downloaded by remote users. YARN-5373 only 
fixes the issue on Linux. This jira implements the fix for Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)
Haibo Chen created YARN-5373:


 Summary: NPE introduced by YARN-4958 (The file localization 
process should allow...)
 Key: YARN-5373
 URL: https://issues.apache.org/jira/browse/YARN-5373
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.9.0
Reporter: Haibo Chen
Assignee: Haibo Chen


YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{{code}}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
}
{{code}}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()), new 
Path(wildLink.getName()));
  }
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
  }
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.


> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
>   for (File wildLink : directory.listFiles()) {
>   sb.symlink(new Path(wildLink.toString()), new 
> Path(wildLink.getName()));
>   }
> {code}
> When directory.listFiles returns null, NPE will cause the container fail to 
> launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
  }
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.


> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
>   for (File wildLink : directory.listFiles()) {
>   sb.symlink(new Path(wildLink.toString()),
>   new Path(wildLink.getName()));
>   }
> {code}
> When directory.listFiles returns null, NPE will cause the container fail to 
> launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{{code}}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()),
  new Path(wildLink.getName()));
}
{{code}}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.


> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
>   for (File wildLink : directory.listFiles()) {
>   sb.symlink(new Path(wildLink.toString()),
>   new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null, NPE will cause the container fail to 
> launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
for (File wildLink : directory.listFiles()) {
sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
  for (File wildLink : directory.listFiles()) {
  sb.symlink(new Path(wildLink.toString()), new 
Path(wildLink.getName()));
  }
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.


> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null, NPE will cause the container fail to 
> launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE listing wildcard directory in containerLaunch

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Summary: NPE listing wildcard directory in containerLaunch  (was: NPE 
introduced by YARN-4958 (The file localization process should allow...))

> NPE listing wildcard directory in containerLaunch
> -
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Critical
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375988#comment-15375988
 ] 

Haibo Chen commented on YARN-5373:
--

As per offline discussion with Daniel, the cause is that in a secure cluster, 
the node manager that executes container launch code runs as a user that has no 
permission to read/execute the local wildcard directory that is downloaded as a 
resource by the remote user. Thus, directory.listFiles() return null.

> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Priority: Critical  (was: Major)

> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Critical
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE introduced by YARN-4958 (The file localization process should allow...)

2016-07-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Description: 
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
for (File wildLink : directory.listFiles()) {
sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null (only happens in a secure cluster), NPE 
will cause the container fail to launch.

  was:
YARN-4958 added support for wildcards in file localization. It introduces a NPE 
at 
{code:java}
for (File wildLink : directory.listFiles()) {
sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
}
{code}
When directory.listFiles returns null, NPE will cause the container fail to 
launch.


> NPE introduced by YARN-4958 (The file localization process should allow...)
> ---
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5381) fix NPE listing wildcard directory on Windows

2016-07-20 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386865#comment-15386865
 ] 

Haibo Chen commented on YARN-5381:
--

Per [~templedf]'s comment in YARN-5373, a windows implementation is no longer 
needed as PrivilegedOperation is used. Closing this.

> fix NPE listing wildcard directory on Windows
> -
>
> Key: YARN-5381
> URL: https://issues.apache.org/jira/browse/YARN-5381
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Haibo Chen
>Priority: Blocker
>
> NPE can be thrown when wildcard is used in libjar option and if the cluster 
> is secure. The root cause is that NM can be running as a user that does not 
> have access to resource files that are downloaded by remote users. YARN-5373 
> only fixes the issue on Linux. This jira implements the fix for Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5381) fix NPE listing wildcard directory on Windows

2016-07-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-5381.
--
Resolution: Workaround

> fix NPE listing wildcard directory on Windows
> -
>
> Key: YARN-5381
> URL: https://issues.apache.org/jira/browse/YARN-5381
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Haibo Chen
>Priority: Blocker
>
> NPE can be thrown when wildcard is used in libjar option and if the cluster 
> is secure. The root cause is that NM can be running as a user that does not 
> have access to resource files that are downloaded by remote users. YARN-5373 
> only fixes the issue on Linux. This jira implements the fix for Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5373) NPE listing wildcard directory in containerLaunch

2016-07-18 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5373:
-
Assignee: Daniel Templeton  (was: Haibo Chen)

> NPE listing wildcard directory in containerLaunch
> -
>
> Key: YARN-5373
> URL: https://issues.apache.org/jira/browse/YARN-5373
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Haibo Chen
>Assignee: Daniel Templeton
>Priority: Blocker
>
> YARN-4958 added support for wildcards in file localization. It introduces a 
> NPE 
> at 
> {code:java}
> for (File wildLink : directory.listFiles()) {
> sb.symlink(new Path(wildLink.toString()), new Path(wildLink.getName()));
> }
> {code}
> When directory.listFiles returns null (only happens in a secure cluster), NPE 
> will cause the container fail to launch.
> Hive, Oozie jobs fail as a result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages

2017-02-08 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4985:
-
Attachment: YARN-4985-YARN-5355.prelim.patch

> Refactor the coprocessor code & other definition classes into independent 
> packages
> --
>
> Key: YARN-4985
> URL: https://issues.apache.org/jira/browse/YARN-4985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-4985-YARN-5355.prelim.patch
>
>
> As part of the coprocessor deployment, we have realized that it will be much 
> cleaner to have the coprocessor code sit in a package which does not depend 
> on hadoop-yarn-server classes. It only needs hbase and other util classes.
> These util classes and tag definition related classes can be refactored into 
> their own independent "definition" class package so that making changes to 
> coprocessor code, upgrading hbase, deploying hbase on a different hadoop 
> version cluster etc all becomes operationally much easier and less error 
> prone to having different library jars etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages

2017-02-08 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858743#comment-15858743
 ] 

Haibo Chen commented on YARN-4985:
--

Uploading a patch for preliminary review. The hbase-backend module has been 
split into two separate modules based on the versions of 
hadoop-common/hdfs/auth that they need to run under. Coprocessor code and code 
in hbase-tests are now in the same module, that is, 
*timelineservice-hbase-server, which depends on *timelineservice-hbase-client 
that includes all table schema and hbase code executed in YARN trunk. YARN-6094 
has enabled dynamic loading of coprocessor from HDFS, now if coprocessor code 
lives in a hbase-server jar that depends on hbase-client jar, I don't think the 
dynamic loading from HDFS will work unless HBase somehow knows how to pull in 
the dependent jars. Thoughts on workaround, [~sjlee0]? In the meantime, I will 
continue to verify maven dependencies to make sure they are clean.

> Refactor the coprocessor code & other definition classes into independent 
> packages
> --
>
> Key: YARN-4985
> URL: https://issues.apache.org/jira/browse/YARN-4985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-4985-YARN-5355.prelim.patch
>
>
> As part of the coprocessor deployment, we have realized that it will be much 
> cleaner to have the coprocessor code sit in a package which does not depend 
> on hadoop-yarn-server classes. It only needs hbase and other util classes.
> These util classes and tag definition related classes can be refactored into 
> their own independent "definition" class package so that making changes to 
> coprocessor code, upgrading hbase, deploying hbase on a different hadoop 
> version cluster etc all becomes operationally much easier and less error 
> prone to having different library jars etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1593) support out-of-proc AuxiliaryServices

2017-02-06 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854528#comment-15854528
 ] 

Haibo Chen commented on YARN-1593:
--

[~vvasudev] Want to check to see if you have some time to address some of the 
comments above?

> support out-of-proc AuxiliaryServices
> -
>
> Key: YARN-1593
> URL: https://issues.apache.org/jira/browse/YARN-1593
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, rolling upgrade
>Reporter: Ming Ma
>Assignee: Varun Vasudev
> Attachments: SystemContainersandSystemServices.pdf
>
>
> AuxiliaryServices such as ShuffleHandler currently run in the same process as 
> NM. There are some benefits to host them in dedicated processes.
> 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the 
> ShuffleHandler restart. If ShuffleHandler runs as a separate process, 
> ShuffleHandler can continue to run during NM restart. NM can reconnect the 
> the running ShuffleHandler after restart.
> 2. Resource management. It is possible another type of AuxiliaryServices will 
> be implemented. AuxiliaryServices are considered YARN application specific 
> and could consume lots of resources. Running AuxiliaryServices in separate 
> processes allow easier resource management. NM could potentially stop a 
> specific AuxiliaryServices process from running if it consumes resource way 
> above its allocation.
> Here are some high level ideas:
> 1. NM provides a hosting process for each AuxiliaryService. Existing 
> AuxiliaryService API doesn't change.
> 2. The hosting process provides RPC server for AuxiliaryService proxy object 
> inside NM to connect to.
> 3. When we rolling restart NM, the existing AuxiliaryService processes will 
> continue to run. NM could reconnect to the running AuxiliaryService processes 
> upon restart.
> 4. Policy and resource management of AuxiliaryServices. So far we don't have 
> immediate need for this. AuxiliaryService could run inside a container and 
> its resource utilization could be taken into account by RM and RM could 
> consider a specific type of applications overutilize cluster resource.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages

2017-01-25 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned YARN-4985:


Assignee: Haibo Chen  (was: Vrushali C)

> Refactor the coprocessor code & other definition classes into independent 
> packages
> --
>
> Key: YARN-4985
> URL: https://issues.apache.org/jira/browse/YARN-4985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Haibo Chen
>  Labels: YARN-5355
>
> As part of the coprocessor deployment, we have realized that it will be much 
> cleaner to have the coprocessor code sit in a package which does not depend 
> on hadoop-yarn-server classes. It only needs hbase and other util classes.
> These util classes and tag definition related classes can be refactored into 
> their own independent "definition" class package so that making changes to 
> coprocessor code, upgrading hbase, deploying hbase on a different hadoop 
> version cluster etc all becomes operationally much easier and less error 
> prone to having different library jars etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages

2017-01-25 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838583#comment-15838583
 ] 

Haibo Chen commented on YARN-4985:
--

[~sjlee0] Can you elaborate a little more on why the schema creator can also be 
"client"? My understanding is that since we run "bin/hbase 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineSchemaCreator", 
the hadoop dependencies will be provided by the hbase cluster and therefore 
their versions will be $hbase-compatible-hadoop.version}.

> Refactor the coprocessor code & other definition classes into independent 
> packages
> --
>
> Key: YARN-4985
> URL: https://issues.apache.org/jira/browse/YARN-4985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Haibo Chen
>  Labels: YARN-5355
>
> As part of the coprocessor deployment, we have realized that it will be much 
> cleaner to have the coprocessor code sit in a package which does not depend 
> on hadoop-yarn-server classes. It only needs hbase and other util classes.
> These util classes and tag definition related classes can be refactored into 
> their own independent "definition" class package so that making changes to 
> coprocessor code, upgrading hbase, deploying hbase on a different hadoop 
> version cluster etc all becomes operationally much easier and less error 
> prone to having different library jars etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages

2017-02-21 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876620#comment-15876620
 ] 

Haibo Chen commented on YARN-4985:
--

Apologies for my delayed response! I did not have answers to your questions, so 
I took some time to actually try it out. Attached is my POC patch.

I managed to extract a common module that both client and server code depend 
on. The number of dependencies of the new schema module is much smaller, but 
still include hadoop-yarn-api (for use of ApplicationId) and 
hadoop-yarn-server-applicationhistoryservice (for use of GenericObjectMapper), 
as I think ValueConverters belong to schema. With this new module, the 
undesirable dependency of hbase-server module on hbase-client is no longer 
necessary.

I was not able to, however, redistribute tests in hbase-tests into client and 
server modules. The reason is that all tests, regardless of whether it is 
server or client, depend on HBaseTestingUtility which only works with 
hadoop-common-2.5.1. Therefore, in that sense, I think both tests for 
hbase-client and tests for hbase-server should still reside in the same module.

With this new module, we do still have the coprocessor installation issue. I am 
totally speculating here. Is it possible to configure maven so that it will 
combine hbase-schema and hbase-server into one jar, as a workaround?

> Refactor the coprocessor code & other definition classes into independent 
> packages
> --
>
> Key: YARN-4985
> URL: https://issues.apache.org/jira/browse/YARN-4985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-4985-YARN-5355.prelim.patch
>
>
> As part of the coprocessor deployment, we have realized that it will be much 
> cleaner to have the coprocessor code sit in a package which does not depend 
> on hadoop-yarn-server classes. It only needs hbase and other util classes.
> These util classes and tag definition related classes can be refactored into 
> their own independent "definition" class package so that making changes to 
> coprocessor code, upgrading hbase, deploying hbase on a different hadoop 
> version cluster etc all becomes operationally much easier and less error 
> prone to having different library jars etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages

2017-02-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878866#comment-15878866
 ] 

Haibo Chen commented on YARN-4985:
--

Sorry for missing your question, [~sjlee0]. One one hand, FlowScanner needs the 
byte representation of FlowRunColumn FlowRunColumnPrefix and their 
ValueConverters. Because these getter methods are defined in the 
superclass/interface, changing them means changing all their sibling classes. 
On the other hand, FlowScanner also depends on AggregationOperation and 
AggregationCompactionDimension, which are used extensively by hbase-client side 
code as well. Based on the two observations, I think having a new common module 
is probably easier than trying to move coprocessor dependencies into 
hbase-server module.

> Refactor the coprocessor code & other definition classes into independent 
> packages
> --
>
> Key: YARN-4985
> URL: https://issues.apache.org/jira/browse/YARN-4985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-4985-YARN-5355.poc.patch, 
> YARN-4985-YARN-5355.prelim.patch
>
>
> As part of the coprocessor deployment, we have realized that it will be much 
> cleaner to have the coprocessor code sit in a package which does not depend 
> on hadoop-yarn-server classes. It only needs hbase and other util classes.
> These util classes and tag definition related classes can be refactored into 
> their own independent "definition" class package so that making changes to 
> coprocessor code, upgrading hbase, deploying hbase on a different hadoop 
> version cluster etc all becomes operationally much easier and less error 
> prone to having different library jars etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages

2017-02-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879219#comment-15879219
 ] 

Haibo Chen commented on YARN-4985:
--

Let's leave it as is then. Should we close this as won't fix?

> Refactor the coprocessor code & other definition classes into independent 
> packages
> --
>
> Key: YARN-4985
> URL: https://issues.apache.org/jira/browse/YARN-4985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-4985-YARN-5355.poc.patch, 
> YARN-4985-YARN-5355.prelim.patch
>
>
> As part of the coprocessor deployment, we have realized that it will be much 
> cleaner to have the coprocessor code sit in a package which does not depend 
> on hadoop-yarn-server classes. It only needs hbase and other util classes.
> These util classes and tag definition related classes can be refactored into 
> their own independent "definition" class package so that making changes to 
> coprocessor code, upgrading hbase, deploying hbase on a different hadoop 
> version cluster etc all becomes operationally much easier and less error 
> prone to having different library jars etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6218) TestAMRMClient fails with fair scheduler

2017-02-23 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881606#comment-15881606
 ] 

Haibo Chen commented on YARN-6218:
--

Thanks for the patch [~miklos.szeg...@cloudera.com]! Took a cursory look at the 
patch.
1) In places where sleep() is replaced with waitForNMHeartbeat(), we may want 
to update the comments as well since " sleep to let NM's heartbeat to RM and 
trigger allocations" no longer makes sense.
2) At line 1015, the code is check container status from NMs, so it does not 
need to wait for NM heartbeats to RM.
3) bq. // Wait for fair scheduler update thread
   We are synchronously calling update() in the test thread, not waiting for 
the update thread.  so maybe we can say triggering fair scheduler update?



> TestAMRMClient fails with fair scheduler
> 
>
> Key: YARN-6218
> URL: https://issues.apache.org/jira/browse/YARN-6218
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-6218.000.patch
>
>
> We ran into this issue on v2. Allocation does not happen in the specified 
> amount of time.
> Error Message
> expected:<2> but was:<0>
> Stacktrace
> java.lang.AssertionError: expected:<2> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientMatchStorage(TestAMRMClient.java:495)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1728) History server doesn't understand percent encoded paths

2017-02-23 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881397#comment-15881397
 ] 

Haibo Chen commented on YARN-1728:
--

Thanks [~yuanbo] for your update! A few more nits.
1) bq. // Guice-3.0 doesn't support encoded path info,
Support seems a little vague. We can be more concrete by saying like Guice-3.0 
does not decode paths that are encoded
2) Can we rename testEncodedText to testEncodedUrl?
3) app.stop() is called in the final block in all other test methods, we should 
follow that practice as well.

> History server doesn't understand percent encoded paths
> ---
>
> Key: YARN-1728
> URL: https://issues.apache.org/jira/browse/YARN-1728
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abraham Elmahrek
>Assignee: Yuanbo Liu
> Attachments: YARN-1728-branch-2.001.patch, 
> YARN-1728-branch-2.002.patch
>
>
> For example, going to the job history server page 
> http://localhost:19888/jobhistory/logs/localhost%3A8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
>  results in the following error:
> {code}
> Cannot get container logs. Invalid nodeId: 
> test-cdh5-hue.ent.cloudera.com%3A8041
> {code}
> Where the url decoded version works:
> http://localhost:19888/jobhistory/logs/localhost:8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
> It seems like both should be supported as the former is simply percent 
> encoding.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1728) History server doesn't understand percent encoded paths

2017-02-22 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-1728:
-
Target Version/s: 2.7.4, 2.8.1, 2.6.6

> History server doesn't understand percent encoded paths
> ---
>
> Key: YARN-1728
> URL: https://issues.apache.org/jira/browse/YARN-1728
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abraham Elmahrek
>Assignee: Yuanbo Liu
> Attachments: YARN-1728-branch-2.001.patch
>
>
> For example, going to the job history server page 
> http://localhost:19888/jobhistory/logs/localhost%3A8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
>  results in the following error:
> {code}
> Cannot get container logs. Invalid nodeId: 
> test-cdh5-hue.ent.cloudera.com%3A8041
> {code}
> Where the url decoded version works:
> http://localhost:19888/jobhistory/logs/localhost:8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
> It seems like both should be supported as the former is simply percent 
> encoding.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1728) History server doesn't understand percent encoded paths

2017-02-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879322#comment-15879322
 ] 

Haibo Chen commented on YARN-1728:
--

Thanks [~yuanbo] for the patch! The patch looks good to me except for two knits
1) bq. // Make sure path is decoded.
This is confusing if we just look at javadoc in 
HttpServletRequest.getPathInfo() based on the method signature, because it 
already decodes the URL. Maybe we should explicitly mention Guice 3.0 here to 
make it clear
2) The unit test is added to an existing test method for custom routes. I think 
we should create a new test method specifically for the decoding issue here.

> History server doesn't understand percent encoded paths
> ---
>
> Key: YARN-1728
> URL: https://issues.apache.org/jira/browse/YARN-1728
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abraham Elmahrek
>Assignee: Yuanbo Liu
> Attachments: YARN-1728-branch-2.001.patch
>
>
> For example, going to the job history server page 
> http://localhost:19888/jobhistory/logs/localhost%3A8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
>  results in the following error:
> {code}
> Cannot get container logs. Invalid nodeId: 
> test-cdh5-hue.ent.cloudera.com%3A8041
> {code}
> Where the url decoded version works:
> http://localhost:19888/jobhistory/logs/localhost:8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
> It seems like both should be supported as the former is simply percent 
> encoding.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6218) TestAMRMClient fails with fair scheduler

2017-02-24 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883194#comment-15883194
 ] 

Haibo Chen commented on YARN-6218:
--

Thanks for the update, [~miklos.szeg...@cloudera.com]. One nit, the sleep(100) 
right after nmClient.getContainerStatus() is now  changed to sleep(10). We 
should probably leave it as is.

> TestAMRMClient fails with fair scheduler
> 
>
> Key: YARN-6218
> URL: https://issues.apache.org/jira/browse/YARN-6218
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-6218.000.patch, YARN-6218.001.patch
>
>
> We ran into this issue on v2. Allocation does not happen in the specified 
> amount of time.
> Error Message
> expected:<2> but was:<0>
> Stacktrace
> java.lang.AssertionError: expected:<2> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientMatchStorage(TestAMRMClient.java:495)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1728) History server doesn't understand percent encoded paths

2017-02-24 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883115#comment-15883115
 ] 

Haibo Chen commented on YARN-1728:
--

Latest patch looks good to me. non-binding +1

> History server doesn't understand percent encoded paths
> ---
>
> Key: YARN-1728
> URL: https://issues.apache.org/jira/browse/YARN-1728
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Abraham Elmahrek
>Assignee: Yuanbo Liu
> Attachments: YARN-1728-branch-2.001.patch, 
> YARN-1728-branch-2.002.patch, YARN-1728-branch-2.003.patch
>
>
> For example, going to the job history server page 
> http://localhost:19888/jobhistory/logs/localhost%3A8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
>  results in the following error:
> {code}
> Cannot get container logs. Invalid nodeId: 
> test-cdh5-hue.ent.cloudera.com%3A8041
> {code}
> Where the url decoded version works:
> http://localhost:19888/jobhistory/logs/localhost:8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr
> It seems like both should be supported as the former is simply percent 
> encoding.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6030) Eliminate timelineServiceV2 boolean flag in TimelineClientImpl

2017-02-24 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883759#comment-15883759
 ] 

Haibo Chen commented on YARN-6030:
--

This issue seems to be no longer valid after YARN-4675, [~gtCarrera9]?

> Eliminate timelineServiceV2 boolean flag in TimelineClientImpl
> --
>
> Key: YARN-6030
> URL: https://issues.apache.org/jira/browse/YARN-6030
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Li Lu
>Priority: Minor
>
> I just discovered that we're still using a boolean flag {{timelineServiceV2}} 
> after we introduced {{timelineServiceVersion}}. This sounds a little bit 
> error-pruning. After the discussion I think we should only use and trust 
> {{timelineServiceVersion}}. {{timelineServiceV2}} is set upon client 
> creation. Instead of creating a v2 client and set this flag, maybe we'd like 
> to do some sanity check and make sure the creation call is consistent with 
> the configuration? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6030) Eliminate timelineServiceV2 boolean flag in TimelineClientImpl

2017-02-24 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-6030.
--
Resolution: Not A Problem

> Eliminate timelineServiceV2 boolean flag in TimelineClientImpl
> --
>
> Key: YARN-6030
> URL: https://issues.apache.org/jira/browse/YARN-6030
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-5355
>Reporter: Li Lu
>Priority: Minor
>
> I just discovered that we're still using a boolean flag {{timelineServiceV2}} 
> after we introduced {{timelineServiceVersion}}. This sounds a little bit 
> error-pruning. After the discussion I think we should only use and trust 
> {{timelineServiceVersion}}. {{timelineServiceV2}} is set upon client 
> creation. Instead of creating a v2 client and set this flag, maybe we'd like 
> to do some sanity check and make sure the creation call is consistent with 
> the configuration? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages

2017-02-22 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879030#comment-15879030
 ] 

Haibo Chen commented on YARN-4985:
--

bq. What is the implication of not doing this refactoring? Is the original 
problem of being able to build Hadoop and HBase from source resolved without 
this, or does it depend on this?
The problem of building Hadoop and HBase2.0 has been resolved in YARN-5667. 
This is more of an effort to make HBase 2.0 & Hadoop 3 integration in cases of 
HBase 2.0 changes. and improve code organization to explicate, in the form of 
modules, the issue that we are dealing with two hadoop-common versions. I'd 
agree with you that if there is too much work, given that the benefits are very 
limited, we should not do the refactoring. 


> Refactor the coprocessor code & other definition classes into independent 
> packages
> --
>
> Key: YARN-4985
> URL: https://issues.apache.org/jira/browse/YARN-4985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-4985-YARN-5355.poc.patch, 
> YARN-4985-YARN-5355.prelim.patch
>
>
> As part of the coprocessor deployment, we have realized that it will be much 
> cleaner to have the coprocessor code sit in a package which does not depend 
> on hadoop-yarn-server classes. It only needs hbase and other util classes.
> These util classes and tag definition related classes can be refactored into 
> their own independent "definition" class package so that making changes to 
> coprocessor code, upgrading hbase, deploying hbase on a different hadoop 
> version cluster etc all becomes operationally much easier and less error 
> prone to having different library jars etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages

2017-02-21 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4985:
-
Attachment: YARN-4985-YARN-5355.poc.patch

> Refactor the coprocessor code & other definition classes into independent 
> packages
> --
>
> Key: YARN-4985
> URL: https://issues.apache.org/jira/browse/YARN-4985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-4985-YARN-5355.poc.patch, 
> YARN-4985-YARN-5355.prelim.patch
>
>
> As part of the coprocessor deployment, we have realized that it will be much 
> cleaner to have the coprocessor code sit in a package which does not depend 
> on hadoop-yarn-server classes. It only needs hbase and other util classes.
> These util classes and tag definition related classes can be refactored into 
> their own independent "definition" class package so that making changes to 
> coprocessor code, upgrading hbase, deploying hbase on a different hadoop 
> version cluster etc all becomes operationally much easier and less error 
> prone to having different library jars etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6146) Add Builder methods for TimelineEntityFilters

2017-02-21 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned YARN-6146:


Assignee: Haibo Chen

> Add Builder methods for TimelineEntityFilters
> -
>
> Key: YARN-6146
> URL: https://issues.apache.org/jira/browse/YARN-6146
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Rohith Sharma K S
>Assignee: Haibo Chen
>
> The timeline filters are evolving and can be add more and more filters. It is 
> better to start using Builder methods rather than changing constructor every 
> time for adding new filters. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages

2017-02-09 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859912#comment-15859912
 ] 

Haibo Chen commented on YARN-4985:
--

The dependency is not mainly because all tests are in the hbase-server module 
as they are now, but because FlowScanner needs to reference FlowRun table 
schema classes. If we were to extract another module that only includes 
table/column definitions, we can avoid the dependency from hbase-server on 
hbase-client. But hbase-client code seems quite coupled to the table schema 
code due to current code implementation (read & write methods are defined 
within the Table/Column classes).  I will try to break the current hbase-client 
module in my patch into hbase-schema and hbase-client. This way, we should not 
have the undesirable dependency any more. 

That said, I think my previous concern about loading coprocessor from HDFS is 
still valid, since we will still have hbase-server depend on hbase-schema.

> Refactor the coprocessor code & other definition classes into independent 
> packages
> --
>
> Key: YARN-4985
> URL: https://issues.apache.org/jira/browse/YARN-4985
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Haibo Chen
>  Labels: YARN-5355
> Attachments: YARN-4985-YARN-5355.prelim.patch
>
>
> As part of the coprocessor deployment, we have realized that it will be much 
> cleaner to have the coprocessor code sit in a package which does not depend 
> on hadoop-yarn-server classes. It only needs hbase and other util classes.
> These util classes and tag definition related classes can be refactored into 
> their own independent "definition" class package so that making changes to 
> coprocessor code, upgrading hbase, deploying hbase on a different hadoop 
> version cluster etc all becomes operationally much easier and less error 
> prone to having different library jars etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-18 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829025#comment-15829025
 ] 

Haibo Chen commented on YARN-5928:
--

Not sure what is causing all the mvn red flags. Unit test failures are the same 
as stated in previous comment.

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928.06.patch, YARN-5928-YARN-5355.02.patch, 
> YARN-5928-YARN-5355.03.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.05.patch, 
> YARN-5928-YARN-5355.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-17 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5928:
-
Attachment: YARN-5928-YARN-5355.05.patch

A newly rebased patch.

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928-YARN-5355.02.patch, YARN-5928-YARN-5355.03.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-17 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827098#comment-15827098
 ] 

Haibo Chen edited comment on YARN-5928 at 1/18/17 12:06 AM:


A newly rebased patch for branch YARN-5355 only.


was (Author: haibochen):
A newly rebased patch.

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928-YARN-5355.02.patch, YARN-5928-YARN-5355.03.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-17 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827173#comment-15827173
 ] 

Haibo Chen commented on YARN-5928:
--

Yes, I believe so. I have tried the standalone timelineservice-hbase jar in a 
hbase cluster and running a mr pi job with the hbase cluster as the backend. 
Things worked for me.

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928-YARN-5355.02.patch, YARN-5928-YARN-5355.03.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-18 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828607#comment-15828607
 ] 

Haibo Chen commented on YARN-5928:
--

Thanks a lot for your reviews, [~sjlee0]! I do see those hadoop* dependencies 
listed under "unused declared dependencies" section when I ran mvn 
dependency:analyze on my machine. I am using maven 3.3.9, not sure what the 
cause is.  In any case, I will remove those "mvn fails to detect" comments. The 
three failed unit tests are all unrelated. 
TestMiniYarnClusterNodeUtilization.testUpdateNodeUtilization is YARN-5728, 
TestAMRestart.testRMAppAttemptFailuresValidityInterval seems to be YARN-5043, 
and TestContainerManagerSecurity fails even without the patch. Will address the 
rest of your comments along with the license warnings in a new patch.

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928-YARN-5355.02.patch, YARN-5928-YARN-5355.03.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-18 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5928:
-
Attachment: YARN-5928-YARN-5355.06.patch

A new patch with comments addressed and a lower similarity

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928-YARN-5355.02.patch, YARN-5928-YARN-5355.03.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.05.patch, YARN-5928-YARN-5355.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-18 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828686#comment-15828686
 ] 

Haibo Chen commented on YARN-5928:
--

I will upload a patch for trunk shortly.

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928-YARN-5355.02.patch, YARN-5928-YARN-5355.03.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.05.patch, YARN-5928-YARN-5355.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-18 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5928:
-
Attachment: YARN-5928.06.patch

Uploading a patch for branch trunk.

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928.06.patch, YARN-5928-YARN-5355.02.patch, 
> YARN-5928-YARN-5355.03.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.05.patch, 
> YARN-5928-YARN-5355.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-19 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830265#comment-15830265
 ] 

Haibo Chen commented on YARN-5928:
--

As before, the unit tests are not related.

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928.06.patch, YARN-5928-YARN-5355.02.patch, 
> YARN-5928-YARN-5355.03.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.05.patch, 
> YARN-5928-YARN-5355.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-19 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-5928:
-
Attachment: YARN-5928-YARN-5355.07.patch

Uploading a new patch for branch YARN-5355 to resolve the documentation 
conflict with YARN-6094. The trunk patch is not impacted.

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928.06.patch, YARN-5928-YARN-5355.02.patch, 
> YARN-5928-YARN-5355.03.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.05.patch, 
> YARN-5928-YARN-5355.06.patch, YARN-5928-YARN-5355.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5667) Move HBase backend code in ATS v2 into its separate module

2017-01-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved YARN-5667.
--
Resolution: Done

> Move HBase backend code in ATS v2  into its separate module
> ---
>
> Key: YARN-5667
> URL: https://issues.apache.org/jira/browse/YARN-5667
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Blocker
> Attachments: New module structure.png, part1.yarn5667.prelim.patch, 
> part2.yarn5667.prelim.patch, part3.yarn5667.prelim.patch, 
> part4.yarn5667.prelim.patch, part5.yarn5667.prelim.patch, 
> pt1.yarn5667.001.patch, pt2.yarn5667.001.patch, pt3.yarn5667.001.patch, 
> pt4.yarn5667.001.patch, pt5.yarn5667.001.patch, pt6.yarn5667.001.patch, 
> pt9.yarn5667.001.patch, yarn5667-001.tar.gz
>
>
> The HBase backend code currently lives along with the core ATS v2 code in 
> hadoop-yarn-server-timelineservice module. Because Resource Manager depends 
> on hadoop-yarn-server-timelineservice, an unnecessary dependency of the RM 
> module on HBase modules is introduced (HBase backend is pluggable, so we do 
> not need to directly pull in HBase jars). 
> In our internal effort to try ATS v2 with HBase 2.0 which depends on Hadoop 
> 3, we encountered a circular dependency during our builds between HBase2.0 
> and Hadoop3 artifacts.
> {code}
> hadoop-mapreduce-client-common, hadoop-yarn-client, 
> hadoop-yarn-server-resourcemanager, hadoop-yarn-server-timelineservice, 
> hbase-server, hbase-prefix-tree, hbase-hadoop2-compat, 
> hadoop-mapreduce-client-jobclient, hadoop-mapreduce-client-common]
> {code}
> This jira proposes we move all HBase-backend-related code from 
> hadoop-yarn-server-timelineservice into its own module (possible name is 
> yarn-server-timelineservice-storage) so that core RM modules do not depend on 
> HBase modules any more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5928) Move ATSv2 HBase backend code into a new module that is only dependent at runtime by yarn servers

2017-01-20 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832181#comment-15832181
 ] 

Haibo Chen commented on YARN-5928:
--

Thanks [~sjlee0] for your reviews!

> Move ATSv2 HBase backend code into a new module that is only dependent at 
> runtime by yarn servers
> -
>
> Key: YARN-5928
> URL: https://issues.apache.org/jira/browse/YARN-5928
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 3.0.0-alpha2, YARN-5355
>
> Attachments: YARN-5928.01.patch, YARN-5928.02.patch, 
> YARN-5928.06.patch, YARN-5928-YARN-5355.02.patch, 
> YARN-5928-YARN-5355.03.patch, YARN-5928-YARN-5355.04.patch, 
> YARN-5928-YARN-5355.04.patch, YARN-5928-YARN-5355.05.patch, 
> YARN-5928-YARN-5355.06.patch, YARN-5928-YARN-5355.07.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4821) Have a separate NM timeline publishing-interval

2017-02-28 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888479#comment-15888479
 ] 

Haibo Chen commented on YARN-4821:
--

[~Naganarasimha], have you got cycles to continue on this jira?

> Have a separate NM timeline publishing-interval
> ---
>
> Key: YARN-4821
> URL: https://issues.apache.org/jira/browse/YARN-4821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
>  Labels: YARN-5355
> Attachments: YARN-4821-YARN-2928.v1.001.patch
>
>
> Currently the interval with which NM publishes container CPU and memory 
> metrics is tied to {{yarn.nodemanager.resource-monitor.interval-ms}} whose 
> default is 3 seconds. This is too aggressive.
> There should be a separate configuration that controls how often 
> {{NMTimelinePublisher}} publishes container metrics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >