[jira] [Created] (HADOOP-18816) Rebuild Exceptions on Client side to get genuine exceptions

2023-07-20 Thread Janus Chow (Jira)
Janus Chow created HADOOP-18816:
---

 Summary: Rebuild Exceptions on Client side to get genuine 
exceptions
 Key: HADOOP-18816
 URL: https://issues.apache.org/jira/browse/HADOOP-18816
 Project: Hadoop Common
  Issue Type: Task
Reporter: Janus Chow
Assignee: Janus Chow


In current's RPC design, if Server sends an exception back, Client can only 
rebuild the exception according to the original exception's error message, if 
the exceptions has some fields containing important information, they will be 
discarded since we can not rebuild them based on message string easily.

This ticket is to introduce a new interface for Exceptions which supports 
reconstructing. If Clients want to rebuild the exception, they can just 
implement the methods and the reconstruction will be done automatically.

The interface uses String[] as parameter for simplicity.  I thought of using 
Protobuf to store all the exceptions or fields, but the generacity can not be 
perfectlly met. So we need Client to support it by accepting "String[]" and 
transform the String to it's original type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18723) Add detail logs if distcp checksum mismatch

2023-04-27 Thread Janus Chow (Jira)
Janus Chow created HADOOP-18723:
---

 Summary: Add detail logs if distcp checksum mismatch
 Key: HADOOP-18723
 URL: https://issues.apache.org/jira/browse/HADOOP-18723
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Janus Chow
Assignee: Janus Chow


We encountered some errors of mismatch checksum during Distcp jobs. It took us 
some time to figure out that checksum type is different.

Adding error logs shall help us to figure out such problems faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17468) NullPointerException from register in MetricsSystemImpl

2021-01-14 Thread Janus Chow (Jira)
Janus Chow created HADOOP-17468:
---

 Summary: NullPointerException from register in MetricsSystemImpl
 Key: HADOOP-17468
 URL: https://issues.apache.org/jira/browse/HADOOP-17468
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Janus Chow


This is an error from Ozone's unit test case 
[HDDS-4688|https://github.com/apache/ozone/pull/1795#issuecomment-760052788]. 
The error is as follows:

 
{code:java}
java.lang.NullPointerException: configjava.lang.NullPointerException: config
 at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:897) 
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSink(MetricsSystemImpl.java:298)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:277)
 at 
org.apache.hadoop.hdds.server.http.BaseHttpServer.start(BaseHttpServer.java:298)
...{code}
The reason should be happened here 
"https://github.com/apache/hadoop/commit/2f500e4635ea4347a55693b1a10a4a4465fe5fac#;,
 if the _name_ is contained in _allSinks_ but not in _sink_, it will invoke the 
method of _registerSink_. But if the variable of config is null, it will throw 
the exception of NullPointerException.

 

A suggestion would be checking if the variable of config is null before calling 
_registerSink._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17429) Add reload option to RefreshCallQueue

2020-12-11 Thread Janus Chow (Jira)
Janus Chow created HADOOP-17429:
---

 Summary: Add reload option to RefreshCallQueue
 Key: HADOOP-17429
 URL: https://issues.apache.org/jira/browse/HADOOP-17429
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Janus Chow


Currently the can use the dfsadmin command of "refreshCallQueue" to refresh all 
configs of FairCallQueue, but it will cause call queue spike during the refresh 
on NameNode.

In  
[HADOOP-17421|https://issues.apache.org/jira/projects/HADOOP/issues/HADOOP-17421?filter=allopenissues],
 we added some configurations to specify queues for some static users, with 
which we can have fine-grained control on the behavior of different users 
manually, this feature will require us to refresh call queue of NameNodes 
constantly, which will sure cause bad impacts to NameNode's performance.

This ticket was to propose adding a reload option for the command of 
refreshCallQueue, which will only trigger the _callQueueManager_ to reload 
something, instead of reconstructing new scheduler and new queues.

The basic design is :
 # Add a method of "reload" in the interface of _Scheduler._
 # Add different _RefreshCallQueueTypes_ to trigger different refresh 
operations.
 # Add the argument in DFSAdmin for *reload* option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17421) Specify user's queue via configuration in FairCallQueue

2020-12-08 Thread Janus Chow (Jira)
Janus Chow created HADOOP-17421:
---

 Summary: Specify user's queue via configuration in FairCallQueue 
 Key: HADOOP-17421
 URL: https://issues.apache.org/jira/browse/HADOOP-17421
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Janus Chow


The feature of FairCallQueue helps a lot in maintaining a fair and good service 
in a multi-tenant cluster, each user is assigned to queues with different 
priority to reach this goal. But in production, we met some problems that the 
automatic assignment won't fit, the problems are as follows:
 # We have a service account that would send more NN requests, for some 
reasons, we would like to keep this user and allow this user to keep this 
volume of operations. When we deployed FairCallQueue, this service user would 
be treated as a bad user and assigned to a lower queue, causing some slowness 
on the service account.
 # We are having more Flink jobs writing checkpoints to our NN, and the 
checkpoint operations have a characteristic that they would have a periodically 
high cost on the NN with an interval of several minutes. FairCallQueue (with 
cost-based enabled) doesn't have good control of this kind of operations 
because when this kind of operations starts, the cost in the decay window of 
this user is quite low, so the user will be assigned to queue 0, after some 
windows, when the users' high cost has got the attention and assigned to a 
lower queue, the user's operations are already finished. 

For problem 1, we noticed that there is already an option mentioned in 
HADOOP-17165, but in our case, the service account isn't that important that 
we'd allow it to always be assigned to queue 0. 

To solve these problems, we'd like to raise a solution by specifying the queue 
for some static users via config. The basic design is as follows:
 * Specify the static users in config for each queue.
 * Load the mapping from the config while initializing the callqueue.
 * Check the configured queue for each user when assigning the queue.
 * The cost time of the static users would not be count in our decay 
calculation to mitigate the impacts on other normal users' costs.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17356) RPC FairCallQueue for special users

2020-11-06 Thread Janus Chow (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janus Chow resolved HADOOP-17356.
-
  Assignee: Janus Chow
Resolution: Won't Fix

> RPC FairCallQueue for special users
> ---
>
> Key: HADOOP-17356
> URL: https://issues.apache.org/jira/browse/HADOOP-17356
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
> Attachments: FairCallQueueTest.java, Implement 0.png, Implement 
> 1.png, Implement 2.png, Implement 3.png
>
>
> In HADOOP-15016, the idea was first raised to support special users by 
> assigning each special user an independent queue with a share. The design was 
> intended for the user to better control the RPC schedule, but there is also a 
> risk that users may add a lot of items in the config of special-users, 
> causing a lot of queues in the RPCScheduler.
> This ticket records some ideas to mitigate the risks while solving the 
> special-user problem based on HADOOP-15016.
> 0. The current implementation is as follows, all users will be treated 
> equally, _multiplexer_ will decide the call count in each queue.
> !Implement 0.png!
> 1. The first idea is to amplify the weight of super-users and resue the 
> initial queues. This idea is easy to implement, but ordinary users and 
> special users would be affected by each other, and it would be difficult for 
> the _multiplexer_ to guarantee the traffic of super-suers.
> !Implement 1.png!
> 2. The second idea is to set up one independent queue for all special users 
> with a config controlling the weight of all special-users. One concern for 
> this idea is that the scheduler between super-users' calls may not be fair.
> !Implement 2.png!
> 3. The third idea is to also use priority queues for special-users based on 
> idea 2, ensuring the fair handling of all super-users. Another benefit of 
> this idea is we can use the queues to implement cost-based calculation.
> !Implement 3.png!
> I think Idea 3 should be a good balance of complexity and useability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17356) RPC FairCallQueue for special users

2020-11-04 Thread Janus Chow (Jira)
Janus Chow created HADOOP-17356:
---

 Summary: RPC FairCallQueue for special users
 Key: HADOOP-17356
 URL: https://issues.apache.org/jira/browse/HADOOP-17356
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Janus Chow
 Attachments: Implement 0.png, Implement 1.png, Implement 2.png, 
Implement 3.png

In HADOOP-15016, the idea was first raised to support special users by 
assigning each special user an independent queue with a share. The design was 
intended for the user to better control the RPC schedule, but there is also a 
risk that users may add a lot of items in the config of special-users, causing 
a lot of queues in the RPCScheduler.

This ticket records some ideas to mitigate the risks while solving the 
special-user problem based on HADOOP-15016.

0. The current implementation is as follows, all users will be treated equally, 
_multiplexer_ will decide the call count in each queue.

!Implement 0.png!

1. The first idea is to amplify the weight of super-users and resue the initial 
queues. This idea is easy to implement, but ordinary users and special users 
would be affected by each other, and it would be difficult for the 
_multiplexer_ to guarantee the traffic of super-suers.
!Implement 1.png!

2. The second idea is to set up one independent queue for all special users 
with a config controlling the weight of all special-users. One concern for this 
idea is that the scheduler between super-users' calls may not be fair.
!Implement 2.png!
3. The third idea is to also use priority queues for special-users based on 
idea 2, ensuring the fair handling of all super-users. Another benefit of this 
idea is we can use the queues to implement cost-based calculation.

!Implement 3.png!

I think Idea 3 should be a good balance of complexity and useability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org