[jira] [Created] (HADOOP-18816) Rebuild Exceptions on Client side to get genuine exceptions
Janus Chow created HADOOP-18816: --- Summary: Rebuild Exceptions on Client side to get genuine exceptions Key: HADOOP-18816 URL: https://issues.apache.org/jira/browse/HADOOP-18816 Project: Hadoop Common Issue Type: Task Reporter: Janus Chow Assignee: Janus Chow In current's RPC design, if Server sends an exception back, Client can only rebuild the exception according to the original exception's error message, if the exceptions has some fields containing important information, they will be discarded since we can not rebuild them based on message string easily. This ticket is to introduce a new interface for Exceptions which supports reconstructing. If Clients want to rebuild the exception, they can just implement the methods and the reconstruction will be done automatically. The interface uses String[] as parameter for simplicity. I thought of using Protobuf to store all the exceptions or fields, but the generacity can not be perfectlly met. So we need Client to support it by accepting "String[]" and transform the String to it's original type. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18723) Add detail logs if distcp checksum mismatch
Janus Chow created HADOOP-18723: --- Summary: Add detail logs if distcp checksum mismatch Key: HADOOP-18723 URL: https://issues.apache.org/jira/browse/HADOOP-18723 Project: Hadoop Common Issue Type: Improvement Reporter: Janus Chow Assignee: Janus Chow We encountered some errors of mismatch checksum during Distcp jobs. It took us some time to figure out that checksum type is different. Adding error logs shall help us to figure out such problems faster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17468) NullPointerException from register in MetricsSystemImpl
Janus Chow created HADOOP-17468: --- Summary: NullPointerException from register in MetricsSystemImpl Key: HADOOP-17468 URL: https://issues.apache.org/jira/browse/HADOOP-17468 Project: Hadoop Common Issue Type: Bug Reporter: Janus Chow This is an error from Ozone's unit test case [HDDS-4688|https://github.com/apache/ozone/pull/1795#issuecomment-760052788]. The error is as follows: {code:java} java.lang.NullPointerException: configjava.lang.NullPointerException: config at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:897) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSink(MetricsSystemImpl.java:298) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:277) at org.apache.hadoop.hdds.server.http.BaseHttpServer.start(BaseHttpServer.java:298) ...{code} The reason should be happened here "https://github.com/apache/hadoop/commit/2f500e4635ea4347a55693b1a10a4a4465fe5fac#;, if the _name_ is contained in _allSinks_ but not in _sink_, it will invoke the method of _registerSink_. But if the variable of config is null, it will throw the exception of NullPointerException. A suggestion would be checking if the variable of config is null before calling _registerSink._ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17429) Add reload option to RefreshCallQueue
Janus Chow created HADOOP-17429: --- Summary: Add reload option to RefreshCallQueue Key: HADOOP-17429 URL: https://issues.apache.org/jira/browse/HADOOP-17429 Project: Hadoop Common Issue Type: Improvement Reporter: Janus Chow Currently the can use the dfsadmin command of "refreshCallQueue" to refresh all configs of FairCallQueue, but it will cause call queue spike during the refresh on NameNode. In [HADOOP-17421|https://issues.apache.org/jira/projects/HADOOP/issues/HADOOP-17421?filter=allopenissues], we added some configurations to specify queues for some static users, with which we can have fine-grained control on the behavior of different users manually, this feature will require us to refresh call queue of NameNodes constantly, which will sure cause bad impacts to NameNode's performance. This ticket was to propose adding a reload option for the command of refreshCallQueue, which will only trigger the _callQueueManager_ to reload something, instead of reconstructing new scheduler and new queues. The basic design is : # Add a method of "reload" in the interface of _Scheduler._ # Add different _RefreshCallQueueTypes_ to trigger different refresh operations. # Add the argument in DFSAdmin for *reload* option. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17421) Specify user's queue via configuration in FairCallQueue
Janus Chow created HADOOP-17421: --- Summary: Specify user's queue via configuration in FairCallQueue Key: HADOOP-17421 URL: https://issues.apache.org/jira/browse/HADOOP-17421 Project: Hadoop Common Issue Type: Improvement Reporter: Janus Chow The feature of FairCallQueue helps a lot in maintaining a fair and good service in a multi-tenant cluster, each user is assigned to queues with different priority to reach this goal. But in production, we met some problems that the automatic assignment won't fit, the problems are as follows: # We have a service account that would send more NN requests, for some reasons, we would like to keep this user and allow this user to keep this volume of operations. When we deployed FairCallQueue, this service user would be treated as a bad user and assigned to a lower queue, causing some slowness on the service account. # We are having more Flink jobs writing checkpoints to our NN, and the checkpoint operations have a characteristic that they would have a periodically high cost on the NN with an interval of several minutes. FairCallQueue (with cost-based enabled) doesn't have good control of this kind of operations because when this kind of operations starts, the cost in the decay window of this user is quite low, so the user will be assigned to queue 0, after some windows, when the users' high cost has got the attention and assigned to a lower queue, the user's operations are already finished. For problem 1, we noticed that there is already an option mentioned in HADOOP-17165, but in our case, the service account isn't that important that we'd allow it to always be assigned to queue 0. To solve these problems, we'd like to raise a solution by specifying the queue for some static users via config. The basic design is as follows: * Specify the static users in config for each queue. * Load the mapping from the config while initializing the callqueue. * Check the configured queue for each user when assigning the queue. * The cost time of the static users would not be count in our decay calculation to mitigate the impacts on other normal users' costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-17356) RPC FairCallQueue for special users
[ https://issues.apache.org/jira/browse/HADOOP-17356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janus Chow resolved HADOOP-17356. - Assignee: Janus Chow Resolution: Won't Fix > RPC FairCallQueue for special users > --- > > Key: HADOOP-17356 > URL: https://issues.apache.org/jira/browse/HADOOP-17356 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Janus Chow >Assignee: Janus Chow >Priority: Major > Attachments: FairCallQueueTest.java, Implement 0.png, Implement > 1.png, Implement 2.png, Implement 3.png > > > In HADOOP-15016, the idea was first raised to support special users by > assigning each special user an independent queue with a share. The design was > intended for the user to better control the RPC schedule, but there is also a > risk that users may add a lot of items in the config of special-users, > causing a lot of queues in the RPCScheduler. > This ticket records some ideas to mitigate the risks while solving the > special-user problem based on HADOOP-15016. > 0. The current implementation is as follows, all users will be treated > equally, _multiplexer_ will decide the call count in each queue. > !Implement 0.png! > 1. The first idea is to amplify the weight of super-users and resue the > initial queues. This idea is easy to implement, but ordinary users and > special users would be affected by each other, and it would be difficult for > the _multiplexer_ to guarantee the traffic of super-suers. > !Implement 1.png! > 2. The second idea is to set up one independent queue for all special users > with a config controlling the weight of all special-users. One concern for > this idea is that the scheduler between super-users' calls may not be fair. > !Implement 2.png! > 3. The third idea is to also use priority queues for special-users based on > idea 2, ensuring the fair handling of all super-users. Another benefit of > this idea is we can use the queues to implement cost-based calculation. > !Implement 3.png! > I think Idea 3 should be a good balance of complexity and useability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17356) RPC FairCallQueue for special users
Janus Chow created HADOOP-17356: --- Summary: RPC FairCallQueue for special users Key: HADOOP-17356 URL: https://issues.apache.org/jira/browse/HADOOP-17356 Project: Hadoop Common Issue Type: Improvement Reporter: Janus Chow Attachments: Implement 0.png, Implement 1.png, Implement 2.png, Implement 3.png In HADOOP-15016, the idea was first raised to support special users by assigning each special user an independent queue with a share. The design was intended for the user to better control the RPC schedule, but there is also a risk that users may add a lot of items in the config of special-users, causing a lot of queues in the RPCScheduler. This ticket records some ideas to mitigate the risks while solving the special-user problem based on HADOOP-15016. 0. The current implementation is as follows, all users will be treated equally, _multiplexer_ will decide the call count in each queue. !Implement 0.png! 1. The first idea is to amplify the weight of super-users and resue the initial queues. This idea is easy to implement, but ordinary users and special users would be affected by each other, and it would be difficult for the _multiplexer_ to guarantee the traffic of super-suers. !Implement 1.png! 2. The second idea is to set up one independent queue for all special users with a config controlling the weight of all special-users. One concern for this idea is that the scheduler between super-users' calls may not be fair. !Implement 2.png! 3. The third idea is to also use priority queues for special-users based on idea 2, ensuring the fair handling of all super-users. Another benefit of this idea is we can use the queues to implement cost-based calculation. !Implement 3.png! I think Idea 3 should be a good balance of complexity and useability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org