[jira] [Commented] (YARN-11010) YARN ui2 hangs on the Queues page when the scheduler response contains NaN values

2024-05-11 Thread Xie YiFan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845558#comment-17845558
 ] 

Xie YiFan commented on YARN-11010:
--

[~tdomok] [~bkosztolnik]   hi, I try to reproduce this issue in trunk branch 
and find that it already unable to use percentage mode on leaf queue when set 
parent in absolute mode.  But i can reproduce this issue in another easy way.

Reproduction steps:
 # config DominantResourceCalculator
 # set one child queue with [memory=0,vcores=0] in absolute mode

The root cause is divide zero by zero which return NaN as a result. YARN-9019 
fixed NaN/Infinity issue with ratio function in DefaultResourceCalculator and 
DominantResourceCalculator. The DefaultResourceCalculator.divide is implemented 
by ratio function but DominantResourceCalculator.divide is not.

So DominantResourceCalculator.divide function may return Nan result.

> YARN ui2 hangs on the Queues page when the scheduler response contains NaN 
> values
> -
>
> Key: YARN-11010
> URL: https://issues.apache.org/jira/browse/YARN-11010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Assignee: Xie YiFan
>Priority: Major
> Attachments: capacity-scheduler.xml, shresponse.json
>
>
> When the scheduler response contains NaN values for capacity and maxCapacity 
> the UI2 hangs on the Queues page. The console log shows the following error:
> {code:java}
> SyntaxError: Unexpected token N in JSON at position 666 {code}
> The scheduler response:
> {code:java}
> "maxCapacity": NaN,
> "absoluteMaxCapacity": NaN, {code}
> NaN, infinity, -infinity is not valid in JSON syntax: 
> https://www.json.org/json-en.html
> This might be related as well: YARN-10452
>  
> I managed to reproduce this with AQCv1, where I set the parent queue's 
> capacity in absolute mode, then I used percentage mode on the 
> leaf-queue-template. I'm not sure if this is a valid configuration, however 
> there is no error or warning in RM logs about any configuration error. To 
> trigger the issue the DominantResourceCalculator must be used. (When using 
> absolute mode on the leaf-queue-template this issue is not re-producible, 
> further details on: YARN-10922).
>  
> Reproduction steps:
>  # Start the cluster with the attached configuration
>  # Check the Queues page on UI2 (it should work at this point)
>  # Send an example job (yarn jar hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar 
> pi 1 10)
>  # Check the Queues page on UI2 (it should not be working at this point)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-11010) YARN ui2 hangs on the Queues page when the scheduler response contains NaN values

2024-05-06 Thread Xie YiFan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-11010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844087#comment-17844087
 ] 

Xie YiFan commented on YARN-11010:
--

[~tdomok] hi, are u working on this bug? if no, would you mind I take over?

> YARN ui2 hangs on the Queues page when the scheduler response contains NaN 
> values
> -
>
> Key: YARN-11010
> URL: https://issues.apache.org/jira/browse/YARN-11010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
> Attachments: capacity-scheduler.xml, shresponse.json
>
>
> When the scheduler response contains NaN values for capacity and maxCapacity 
> the UI2 hangs on the Queues page. The console log shows the following error:
> {code:java}
> SyntaxError: Unexpected token N in JSON at position 666 {code}
> The scheduler response:
> {code:java}
> "maxCapacity": NaN,
> "absoluteMaxCapacity": NaN, {code}
> NaN, infinity, -infinity is not valid in JSON syntax: 
> https://www.json.org/json-en.html
> This might be related as well: YARN-10452
>  
> I managed to reproduce this with AQCv1, where I set the parent queue's 
> capacity in absolute mode, then I used percentage mode on the 
> leaf-queue-template. I'm not sure if this is a valid configuration, however 
> there is no error or warning in RM logs about any configuration error. To 
> trigger the issue the DominantResourceCalculator must be used. (When using 
> absolute mode on the leaf-queue-template this issue is not re-producible, 
> further details on: YARN-10922).
>  
> Reproduction steps:
>  # Start the cluster with the attached configuration
>  # Check the Queues page on UI2 (it should work at this point)
>  # Send an example job (yarn jar hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar 
> pi 1 10)
>  # Check the Queues page on UI2 (it should not be working at this point)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11644) LogAggregationService can't upload log in time when application finished

2024-01-09 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-11644:
-
Affects Version/s: 3.3.6

> LogAggregationService can't upload log in time when application finished
> 
>
> Key: YARN-11644
> URL: https://issues.apache.org/jira/browse/YARN-11644
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 3.3.6
>Reporter: Xie YiFan
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: image-2024-01-10-11-03-57-553.png
>
>
> LogAggregationService is responsible for uploading log to HDFS. It applies 
> thread pool to execute upload task.
> The workflow of upload log as follow:
>  # NM construct Applicaiton object when first container of a certain 
> application launch, then notify LogAggregationService to init 
> AppLogAggregationImpl.
>  # LogAggregationService submit AppLogAggregationImpl to task queue
>  # The idle worker of thread pool pulls AppLogAggregationImpl from task queue.
>  # AppLogAggregationImpl do while loop to check the application state, do 
> upload when application finished.
> Suppose the following scenario:
>  * LogAggregationService initialize thread pool with 4 threads.
>  * 4 long running applications start on this NM, so all threads are occupied 
> by aggregator.
>  * The next short application starts on this NM and quickly finish, but no 
> idle thread for this app to upload log.
> as a result, the following applications have to wait the previous 
> applications finish before uploading their logs.
> !image-2024-01-10-11-03-57-553.png|width=599,height=195!
> h4. Solution
> Change the spin behavior of AppLogAggregationImpl. If application has not 
> finished, just return to yield current thread and resubmit itself to executor 
> service. So the LogAggregationService can roll the task queue and the logs of 
> finished application can be uploaded immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11644) LogAggregationService can't upload log in time when application finished

2024-01-09 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-11644:
-
Description: 
LogAggregationService is responsible for uploading log to HDFS. It applies 
thread pool to execute upload task.

The workflow of upload log as follow:
 # NM construct Applicaiton object when first container of a certain 
application launch, then notify LogAggregationService to init 
AppLogAggregationImpl.
 # LogAggregationService submit AppLogAggregationImpl to task queue
 # The idle worker of thread pool pulls AppLogAggregationImpl from task queue.
 # AppLogAggregationImpl do while loop to check the application state, do 
upload when application finished.

Suppose the following scenario:
 * LogAggregationService initialize thread pool with 4 threads.
 * 4 long running applications start on this NM, so all threads are occupied by 
aggregator.
 * The next short application starts on this NM and quickly finish, but no idle 
thread for this app to upload log.

as a result, the following applications have to wait the previous applications 
finish before uploading their logs.

!image-2024-01-10-11-03-57-553.png|width=599,height=195!
h4. Solution

Change the spin behavior of AppLogAggregationImpl. If application has not 
finished, just return to yield current thread and resubmit itself to executor 
service. So the LogAggregationService can roll the task queue and the logs of 
finished application can be uploaded immediately.

  was:
LogAggregationService is responsible for uploading log to HDFS. It applies 
thread pool to execute upload task.

The workflow of upload log as follow:
 # NM construct Applicaiton object when first container of a certain 
application launch, then notify LogAggregationService to init 
AppLogAggregationImpl.
 # LogAggregationService submit AppLogAggregationImpl to task queue.

 # The idle worker of thread pool pulls AppLogAggregationImpl from task queue.

 # AppLogAggregationImpl do while loop to check the application state, do 
upload when application finished.

Suppose the following scenario:
 * LogAggregationService initialize thread pool with 4 threads.

 * 4 long running applications start on this NM, so all threads are occupied by 
aggregator.

 * The next short application starts on this NM and quickly finish, but no idle 
thread for this app to upload log.

as a result, the following applications have to wait the previous applications 
finish before uploading their logs.

!image-2024-01-10-11-03-57-553.png|width=599,height=195!
h4. Solution

Change the spin behavior of AppLogAggregationImpl. If application has not 
finished, just return to yield current thread and resubmit itself to executor 
service. So the LogAggregationService can roll the task queue and the logs of 
finished application can be uploaded immediately.


> LogAggregationService can't upload log in time when application finished
> 
>
> Key: YARN-11644
> URL: https://issues.apache.org/jira/browse/YARN-11644
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Reporter: Xie YiFan
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: image-2024-01-10-11-03-57-553.png
>
>
> LogAggregationService is responsible for uploading log to HDFS. It applies 
> thread pool to execute upload task.
> The workflow of upload log as follow:
>  # NM construct Applicaiton object when first container of a certain 
> application launch, then notify LogAggregationService to init 
> AppLogAggregationImpl.
>  # LogAggregationService submit AppLogAggregationImpl to task queue
>  # The idle worker of thread pool pulls AppLogAggregationImpl from task queue.
>  # AppLogAggregationImpl do while loop to check the application state, do 
> upload when application finished.
> Suppose the following scenario:
>  * LogAggregationService initialize thread pool with 4 threads.
>  * 4 long running applications start on this NM, so all threads are occupied 
> by aggregator.
>  * The next short application starts on this NM and quickly finish, but no 
> idle thread for this app to upload log.
> as a result, the following applications have to wait the previous 
> applications finish before uploading their logs.
> !image-2024-01-10-11-03-57-553.png|width=599,height=195!
> h4. Solution
> Change the spin behavior of AppLogAggregationImpl. If application has not 
> finished, just return to yield current thread and resubmit itself to executor 
> service. So the LogAggregationService can roll the task queue and the logs of 
> finished application can be uploaded immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoo

[jira] [Created] (YARN-11644) LogAggregationService can't upload log in time when application finished

2024-01-09 Thread Xie YiFan (Jira)
Xie YiFan created YARN-11644:


 Summary: LogAggregationService can't upload log in time when 
application finished
 Key: YARN-11644
 URL: https://issues.apache.org/jira/browse/YARN-11644
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation
Reporter: Xie YiFan
Assignee: Xie YiFan
 Attachments: image-2024-01-10-11-03-57-553.png

LogAggregationService is responsible for uploading log to HDFS. It applies 
thread pool to execute upload task.

The workflow of upload log as follow:
 # NM construct Applicaiton object when first container of a certain 
application launch, then notify LogAggregationService to init 
AppLogAggregationImpl.
 # LogAggregationService submit AppLogAggregationImpl to task queue.

 # The idle worker of thread pool pulls AppLogAggregationImpl from task queue.

 # AppLogAggregationImpl do while loop to check the application state, do 
upload when application finished.

Suppose the following scenario:
 * LogAggregationService initialize thread pool with 4 threads.

 * 4 long running applications start on this NM, so all threads are occupied by 
aggregator.

 * The next short application starts on this NM and quickly finish, but no idle 
thread for this app to upload log.

as a result, the following applications have to wait the previous applications 
finish before uploading their logs.

!image-2024-01-10-11-03-57-553.png|width=599,height=195!
h4. Solution

Change the spin behavior of AppLogAggregationImpl. If application has not 
finished, just return to yield current thread and resubmit itself to executor 
service. So the LogAggregationService can roll the task queue and the logs of 
finished application can be uploaded immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-11643) Skip unnecessary pre-check in Multi Node Placement

2024-01-08 Thread Xie YiFan (Jira)
Xie YiFan created YARN-11643:


 Summary: Skip unnecessary pre-check in Multi Node Placement
 Key: YARN-11643
 URL: https://issues.apache.org/jira/browse/YARN-11643
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Xie YiFan
Assignee: Xie YiFan


When Multi Node Placement enabled, RegularContainerAllocator do a while loop to 
find one node from candidate set to allocate for a given scheduler key. Before 
do allocate, pre-check be called to check if current node satisfies check. If 
this node does not pass all checks, just continue to next node.
{code:java}
if (reservedContainer == null) {
  result = preCheckForNodeCandidateSet(node,
  schedulingMode, resourceLimits, schedulerKey);
  if (null != result) {
continue;
  }
} {code}
But some checks are related to scheduler Key or Application which return 
PRIORITY_SKIPPED or APP_SKIPPED. It means that if first node does not pass 
check, the following nodes also do not pass. 
If cluster have 5000 nodes in default partition, Scheduler will waste 5000 
times loop for just one scheduler key.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2082) Support for alternative log aggregation mechanism

2023-12-13 Thread Xie YiFan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796555#comment-17796555
 ] 

Xie YiFan commented on YARN-2082:
-

Hi, it is a long ticket. We are now suffering from the small files problem. We 
have 200,000+ jobs per day for one cluster.
Suppose that a job may running on 25 nodemanagers at average. Then the amount 
of file number should be 200,000 times 25 = 5,000,000 for only one cluster per 
day. HDFS can't handle so many small files. At the create time of this ticket, 
we do not introduce timeline V2 which using HBase as backend storage. And now, 
timeline V2 have good scalability and usability. So i think we can using HBase 
to store log files.

[~slfan1989] [~inigoiri] What do you think about this?

> Support for alternative log aggregation mechanism
> -
>
> Key: YARN-2082
> URL: https://issues.apache.org/jira/browse/YARN-2082
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Ming Ma
>Priority: Major
>
> I will post a more detailed design later. Here is the brief summary and would 
> like to get early feedback.
> Problem Statement:
> Current implementation of log aggregation create one HDFS file for each 
> {application, nodemanager }. These files are relative small, in the range of 
> 1-2 MB. In a large cluster with lots of application and many nodemanagers, it 
> ends up creating lots of small files in HDFS. This creates pressure on HDFS 
> NN on the following ways.
> 1. It increases NN Memory size. It is mitigated by having history server 
> deletes old log files in HDFS.
> 2. Runtime RPC hit on HDFS. Each log aggregation file introduced several NN 
> RPCs such as create, getAdditionalBlock, complete, rename. When the cluster 
> is busy, such RPC hit has impact on NN performance.
> In addition, to support non-MR applications on YARN, we might need to support 
> aggregation for long running applications.
> Design choices:
> 1. Don't aggregate all the logs, as in YARN-221.
> 2. Create a dedicated HDFS namespace used only for log aggregation.
> 3. Write logs to some key-value store like HBase. HBase's RPC hit on NN will 
> be much less.
> 4. Decentralize the application level log aggregation to NMs. All logs for a 
> given application are aggregated first by a dedicated NM before it is pushed 
> to HDFS.
> 5. Have NM aggregate logs on a regular basis; each of these log files will 
> have data from different applications and there needs to be some index for 
> quick lookup.
> Proposal:
> 1. Make yarn log aggregation pluggable for both read and write path. Note 
> that Hadoop FileSystem provides an abstraction and we could ask alternative 
> log aggregator implement compatable FileSystem, but that seems to an overkill.
> 2. Provide a log aggregation plugin that write to HBase. The scheme design 
> needs to support efficient read on a per application as well as per 
> application+container basis; in addition, it shouldn't create hotspot in a 
> cluster where certain users might create more jobs than others. For example, 
> we can use hash($user+$applicationId} + containerid as the row key.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10537) Change type of LogAggregationService threadPool

2020-12-17 Thread Xie YiFan (Jira)
Xie YiFan created YARN-10537:


 Summary: Change type of LogAggregationService threadPool
 Key: YARN-10537
 URL: https://issues.apache.org/jira/browse/YARN-10537
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Xie YiFan


Now, LogAggregationService threadPool is FixedThreadPool which of default 
threadPoolSize is 100. LogAggregationService will construct AppLogAggregator 
for new come application and submit to threadPool. AppLogAggregator do while 
loop unitl application finished. Some application may run very long time due to 
reason such as no enough resource or other. As result, it occupy one thread of 
threadPool. When this application size greater than threadPoolSize, the later 
short-live application can't upload logs until previous long-live application 
finished. So, i think we should replace FixedThreadPool to CachedThreadPool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10537) Change type of LogAggregationService threadPool

2020-12-17 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-10537:
-
Priority: Minor  (was: Major)

> Change type of LogAggregationService threadPool
> ---
>
> Key: YARN-10537
> URL: https://issues.apache.org/jira/browse/YARN-10537
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Xie YiFan
>Priority: Minor
>
> Now, LogAggregationService threadPool is FixedThreadPool which of default 
> threadPoolSize is 100. LogAggregationService will construct AppLogAggregator 
> for new come application and submit to threadPool. AppLogAggregator do while 
> loop unitl application finished. Some application may run very long time due 
> to reason such as no enough resource or other. As result, it occupy one 
> thread of threadPool. When this application size greater than threadPoolSize, 
> the later short-live application can't upload logs until previous long-live 
> application finished. So, i think we should replace FixedThreadPool to 
> CachedThreadPool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2020-08-24 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-6539:

Attachment: YARN-6539.008.patch

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, 
> YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, 
> YARN-6539.006.patch, YARN-6539.007.patch, YARN-6539.008.patch, 
> YARN-6539_3.patch, YARN-6539_4.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10315) Avoid sending RMNodeResoureupdate event if resource is same

2020-06-14 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan reassigned YARN-10315:


Assignee: (was: Xie YiFan)

> Avoid sending RMNodeResoureupdate event if resource is same
> ---
>
> Key: YARN-10315
> URL: https://issues.apache.org/jira/browse/YARN-10315
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Priority: Major
>
> When the node is in DECOMMISSIONING state the RMNodeResourceUpdateEvent is 
> send for every heartbeat . Which will result in scheduler resource update.
> Avoid sending the same.
>  Scheduler node resource update iterates through all the queues for resource 
> update which is costly..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10315) Avoid sending RMNodeResoureupdate event if resource is same

2020-06-14 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan reassigned YARN-10315:


Assignee: Xie YiFan  (was: Sushil Ks)

> Avoid sending RMNodeResoureupdate event if resource is same
> ---
>
> Key: YARN-10315
> URL: https://issues.apache.org/jira/browse/YARN-10315
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Xie YiFan
>Priority: Major
>
> When the node is in DECOMMISSIONING state the RMNodeResourceUpdateEvent is 
> send for every heartbeat . Which will result in scheduler resource update.
> Avoid sending the same.
>  Scheduler node resource update iterates through all the queues for resource 
> update which is costly..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2020-06-12 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-6539:

Attachment: YARN-6539.007.patch

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, 
> YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, 
> YARN-6539.006.patch, YARN-6539.007.patch, YARN-6539_3.patch, YARN-6539_4.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2020-06-12 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-6539:

Attachment: YARN-6539.006.patch

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, 
> YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, 
> YARN-6539.006.patch, YARN-6539_3.patch, YARN-6539_4.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2020-06-12 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-6539:

Attachment: YARN-6539-branch-3.1.0.005.patch

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, 
> YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, 
> YARN-6539_3.patch, YARN-6539_4.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2020-06-12 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-6539:

Attachment: YARN-6539-branch-3.1.0.004.patch

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, 
> YARN-6539-branch-3.1.0.004.patch, YARN-6539_3.patch, YARN-6539_4.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2020-06-12 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-6539:

Attachment: YARN-6539_4.patch

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch, 
> YARN-6539_4.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2020-05-21 Thread Xie YiFan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113693#comment-17113693
 ] 

Xie YiFan commented on YARN-6539:
-

[~BilwaST] ok. I will try it.

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9811) FederationInterceptor fails to recover in Kerberos environment

2019-09-04 Thread Xie YiFan (Jira)
Xie YiFan created YARN-9811:
---

 Summary: FederationInterceptor fails to recover in Kerberos 
environment
 Key: YARN-9811
 URL: https://issues.apache.org/jira/browse/YARN-9811
 Project: Hadoop YARN
  Issue Type: Bug
  Components: amrmproxy
Reporter: Xie YiFan
Assignee: Xie YiFan


*scenario*:
 Start up cluster in Kerberos environment with enable recover & AMRMProxy in 
NM. Submit one application to cluster, and restart NM which has master 
container. The NM will block in FederationInterceptor recover.

*LOG*
{code:java}
INFO org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor: 
Recovering data for FederationInterceptor
INFO org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor: 
Found 0 existing UAMs for application application_1561534175896_4102 in 
NMStateStore
INFO org.apache.hadoop.yarn.server.utils.AMRMClientUtils: Creating RMProxy to 
RM online-bx for protocol ApplicationClientProtocol for user recommend 
(auth:SIMPLE)
INFO 
org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider:
 Initialized Federation proxy for user: recommend
INFO 
org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider:
 Failing over to the ResourceManager for SubClusterId: online-bx
INFO 
org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider:
 Connecting to /10.88.86.142:8032 subClusterId online-bx with protocol 
ApplicationClientProtocol as user recommend (auth:SIMPLE)
WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to 
the server : org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS]
INFO 
org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider:
 Failing over to the ResourceManager for SubClusterId: online-bx
INFO org.apache.hadoop.yarn.server.federation.utils.FederationStateStoreFacade: 
Flushing subClusters from cache and rehydrating from store, most likely on 
account of RM failover.
INFO 
org.apache.hadoop.yarn.server.federation.failover.FederationRMFailoverProxyProvider:
 Connecting to /10.88.86.142:8032 subClusterId online-bx with protocol 
ApplicationClientProtocol as user recommend (auth:SIMPLE)
WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to 
the server : org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS]
INFO org.apache.hadoop.io.retry.RetryInvocationHandler: java.io.IOException: 
DestHost:destPort hadoop1684.bx.momo.com:8032 , LocalHost:localPort 
hadoop999.bx.momo.com/10.88.64.186:0. Failed on local exception: 
java.io.IOException: org.apache.hadoop.security.AccessControlException: Client 
cannot authenticate via:[TOKEN, KERBEROS], while invoking 
ApplicationClientProtocolPBClientImpl.getContainers over online-bx after 1 
failover attempts. Trying to failover after sleeping for 3244ms.{code}
*Analysis*

rmclient.getContainers is called. But AuthMethod of appSubmitter is SIMPLE.We 
should use createProxyUser instead of createRemoteUser in Security.
{code:java}
UserGroupInformation appSubmitter = UserGroupInformation  
.createRemoteUser(getApplicationContext().getUser());  
ApplicationClientProtocol rmClient =   
createHomeRMProxy(getApplicationContext(),  
ApplicationClientProtocol.class, appSubmitter);
  GetContainersResponse response = rmClient  
.getContainers(GetContainersRequest.newInstance(this.attemptId));
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9803) NPE while accessing Scheduler UI

2019-08-29 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-9803:

Attachment: YARN-9803-branch-3.1.1.001.patch

> NPE while accessing Scheduler UI
> 
>
> Key: YARN-9803
> URL: https://issues.apache.org/jira/browse/YARN-9803
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Xie YiFan
>Assignee: Xie YiFan
>Priority: Major
> Attachments: YARN-9803-branch-3.1.1.001.patch
>
>
> The same with what described in YARN-4624
> Scenario:
>  ===
> if not configure all queue's capacity to nodelabel even the value is 0, start 
> cluster and access capacityscheduler page.
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:97)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:342)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:513)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:86)
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9803) NPE while accessing Scheduler UI

2019-08-29 Thread Xie YiFan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919146#comment-16919146
 ] 

Xie YiFan commented on YARN-9803:
-

Because of varible configuredMinResource is not initialized in 
PartitionQueueCapacitiesInfo

> NPE while accessing Scheduler UI
> 
>
> Key: YARN-9803
> URL: https://issues.apache.org/jira/browse/YARN-9803
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Xie YiFan
>Assignee: Xie YiFan
>Priority: Major
>
> The same with what described in YARN-4624
> Scenario:
>  ===
> if not configure all queue's capacity to nodelabel even the value is 0, start 
> cluster and access capacityscheduler page.
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:97)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:342)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:513)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:86)
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9803) NPE while accessing Scheduler UI

2019-08-29 Thread Xie YiFan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-9803:

Description: 
The same with what described in YARN-4624

Scenario:
 ===

if not configure all queue's capacity to nodelabel even the value is 0, start 
cluster and access capacityscheduler page.

Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:108)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:97)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueueBlock.render(CapacitySchedulerPage.java:342)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$LI.__(Hamlet.java:7709)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:513)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:216)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:86)
 

  was:
The same with what described in YARN-4624

Scenario:
 ===

if not configure all queue's capacity to nodelabel even the value is 0, start 
cluster and access capacityscheduler page.

org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/scheduler
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor124.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
at com.google.inject.servlet.GuiceFilter$1.call(GuiceF

[jira] [Created] (YARN-9803) NPE while accessing Scheduler UI

2019-08-29 Thread Xie YiFan (Jira)
Xie YiFan created YARN-9803:
---

 Summary: NPE while accessing Scheduler UI
 Key: YARN-9803
 URL: https://issues.apache.org/jira/browse/YARN-9803
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Xie YiFan
Assignee: Xie YiFan


The same with what described in YARN-4624

Scenario:
 ===

if not configure all queue's capacity to nodelabel even the value is 0, start 
cluster and access capacityscheduler page.

org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/scheduler
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor124.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:178)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
at 
com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:110)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
at 
org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.security.http.MOMOHttpAuthenticationFilter.doFilter(MOMOHttpAuthenticationFilter.java:160)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1613)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.S

[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2019-08-16 Thread Xie YiFan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908885#comment-16908885
 ] 

Xie YiFan commented on YARN-6539:
-

hi [~yzzjjyy] You should set hadoop.security.authorization to false in 
core-site.xml.

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2019-08-12 Thread Xie YiFan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904992#comment-16904992
 ] 

Xie YiFan commented on YARN-6539:
-

[~subru], I cant' find any test that was related to RM and NM secureLogin. 
Also, I think it's hard to add a test, because testing it requires kerberos 
environment.

My implementation:

1.Call SecurityUtils#login in  secureLogin to enable Router login with kerberos 
 like RM & NM does.

2.RouterClientRMService receives the request from the YarnClient and creates 
the FederationClientInterceptor, initializes the UGI based on the user. 
Next,FederationClientInterceptor forward it to RM. FederationClientInterceptor 
constructs clientRMProxy to send RPC requests to RM using previously 
initialized UGI. AbstractClientRequestInterceptor calls 
UserGroupInformation#createProxyUser to construct UGI in setupUser, in other 
word, use Router’s Kerberos identity to proxy the current user.

 

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2019-08-06 Thread Xie YiFan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901638#comment-16901638
 ] 

Xie YiFan commented on YARN-6539:
-

[~subru] Could you review this patch for me?

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2019-08-05 Thread Xie YiFan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-6539:

Attachment: YARN-6539_3.patch

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2019-08-05 Thread Xie YiFan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-6539:

Attachment: YARN-6359_2.patch

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2019-07-28 Thread Xie YiFan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894859#comment-16894859
 ] 

Xie YiFan commented on YARN-6539:
-

This patch doesn't work.I have complete this function.[~shenyinjie]Would you 
mind let me solve this issue?

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Shen Yinjie
>Priority: Minor
> Attachments: YARN-6359_1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9708) Add Yarnclient#getDelegationToken API implementation and SecureLogin in router

2019-07-26 Thread Xie YiFan (JIRA)
Xie YiFan created YARN-9708:
---

 Summary: Add Yarnclient#getDelegationToken API implementation and 
SecureLogin in router
 Key: YARN-9708
 URL: https://issues.apache.org/jira/browse/YARN-9708
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: router
Affects Versions: 3.1.1
Reporter: Xie YiFan
 Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch

1.we use router as proxy to manage multiple cluster which be independent of 
each other in order to apply unified client. Thus, we implement our customized 
AMRMProxyPolicy that doesn't broadcast ResourceRequest to other cluster.

2.Our production environment need kerberos. But router doesn't support 
SecureLogin for now.
https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we improvement 
it.

3.Some framework like oozie would get Token via yarnclient#getDelegationToken 
which router doesn't support. Our solution is that adding homeCluster to 
ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would 
be submitted with specified clusterid so that router knows which cluster to 
submit this job. Router would get Token from one RM according to specified 
clusterid when client call getDelegation meanwhile apply some mechanism to save 
this token in memory.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org