[jira] [Updated] (HADOOP-18991) Remove commons-beanutils dependency from Hadoop 3
[ https://issues.apache.org/jira/browse/HADOOP-18991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-18991: Summary: Remove commons-beanutils dependency from Hadoop 3 (was: Remove commons-benautils dependency from Hadoop 3) > Remove commons-beanutils dependency from Hadoop 3 > - > > Key: HADOOP-18991 > URL: https://issues.apache.org/jira/browse/HADOOP-18991 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Reporter: Istvan Toth >Priority: Major > > Hadoop doesn't acually use it, and it pollutes the classpath of dependent > projects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18870) CURATOR-599 change broke functionality introduced in HADOOP-18139 and HADOOP-18709
[ https://issues.apache.org/jira/browse/HADOOP-18870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved HADOOP-18870. - Fix Version/s: 3.4.0 Resolution: Fixed > CURATOR-599 change broke functionality introduced in HADOOP-18139 and > HADOOP-18709 > -- > > Key: HADOOP-18870 > URL: https://issues.apache.org/jira/browse/HADOOP-18870 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.4.0, 3.3.5 >Reporter: Ferenc Erdelyi >Assignee: Ferenc Erdelyi >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > [Curator PR#391 > |https://github.com/apache/curator/pull/391/files#diff-687a4ed1252bfb4f56b3aeeb28bee4413b7df9bec4b969b72215587158ac875dR59] > introduced a default method in the ZooKeeperFactory interface, hence the > override of the 4-parameter NewZookeeper method in the HadoopZookeeperFactory > class is not taking effect due to this. > Proposing routing the 4-parameter method to a 5-parameter method, which > instantiates the ZKConfiguration as the 5th parameter. This is a non-breaking > change, as the ZKConfiguration is currently instantiated within the method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18870) CURATOR-599 change broke functionality introduced in HADOOP-18139 and HADOOP-18709
[ https://issues.apache.org/jira/browse/HADOOP-18870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762534#comment-17762534 ] Szilard Nemeth commented on HADOOP-18870: - [~bender] {quote} Proposing routing the 4-parameter method to a 5-parameter method, which instantiates the ZKConfiguration as the 5th parameter. This is a non-breaking change, as the ZKConfiguration is currently instantiated within the method. {quote} Am I missing something or you meant `ZKClientConfig` as the 5th parameter, right? Checking the linked Curator PR (https://github.com/apache/curator/pull/391/files#diff-687a4ed1252bfb4f56b3aeeb28bee4413b7df9bec4b969b72215587158ac875dR59) shows me ZKClientConfig as the 5th parameter there. Can you fix the description of the jira ? > CURATOR-599 change broke functionality introduced in HADOOP-18139 and > HADOOP-18709 > -- > > Key: HADOOP-18870 > URL: https://issues.apache.org/jira/browse/HADOOP-18870 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.4.0, 3.3.5 >Reporter: Ferenc Erdelyi >Assignee: Ferenc Erdelyi >Priority: Major > Labels: pull-request-available > > [Curator PR#391 > |https://github.com/apache/curator/pull/391/files#diff-687a4ed1252bfb4f56b3aeeb28bee4413b7df9bec4b969b72215587158ac875dR59] > introduced a default method in the ZooKeeperFactory interface, hence the > override of the 4-parameter NewZookeeper method in the HadoopZookeeperFactory > class is not taking effect due to this. > Proposing routing the 4-parameter method to a 5-parameter method, which > instantiates the ZKConfiguration as the 5th parameter. This is a non-breaking > change, as the ZKConfiguration is currently instantiated within the method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18709) Add curator based ZooKeeper communication support over SSL/TLS into the common library
[ https://issues.apache.org/jira/browse/HADOOP-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-18709: Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > Add curator based ZooKeeper communication support over SSL/TLS into the > common library > -- > > Key: HADOOP-18709 > URL: https://issues.apache.org/jira/browse/HADOOP-18709 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Ferenc Erdelyi >Assignee: Ferenc Erdelyi >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > With HADOOP-16579 the ZooKeeper client is capable of securing communication > with SSL. > To follow the convention introduced in HADOOP-14741, proposing to add to the > core-default.xml the following configurations, as the groundwork for the > components to enable encrypted communication between the individual > components and ZooKeeper: > * hadoop.zk.ssl.keystore.location > * hadoop.zk.ssl.keystore.password > * hadoop.zk.ssl.truststore.location > * hadoop.zk.ssl.truststore.password > These parameters along with the component-specific ssl.client.enable option > (e.g. yarn.zookeeper.ssl.client.enable) should be passed to the > ZKCuratorManager to build the CuratorFramework. The ZKCuratorManager needs a > new overloaded start() method to build the encrypted communication. > * The secured ZK Client uses Netty, hence the dependency is included in the > pom.xml. Added netty-handler and netty-transport-native-epoll dependency to > the pom.xml based on ZOOKEEPER-3494 - "No need to depend on netty-all (SSL)". > * The change was exclusively tested with the unit test, which is a kind of > integration test, as a ZK Server was brought up and the communication tested > between the client and the server. > * This code change is in the common code base and there is no component > calling it yet. Once YARN-11468 - "Zookeeper SSL/TLS support" is implemented, > we can test it in a real cluster environment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18709) Add curator based ZooKeeper communication support over SSL/TLS into the common library
[ https://issues.apache.org/jira/browse/HADOOP-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-18709: Fix Version/s: 3.4.0 > Add curator based ZooKeeper communication support over SSL/TLS into the > common library > -- > > Key: HADOOP-18709 > URL: https://issues.apache.org/jira/browse/HADOOP-18709 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Ferenc Erdelyi >Assignee: Ferenc Erdelyi >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > With HADOOP-16579 the ZooKeeper client is capable of securing communication > with SSL. > To follow the convention introduced in HADOOP-14741, proposing to add to the > core-default.xml the following configurations, as the groundwork for the > components to enable encrypted communication between the individual > components and ZooKeeper: > * hadoop.zk.ssl.keystore.location > * hadoop.zk.ssl.keystore.password > * hadoop.zk.ssl.truststore.location > * hadoop.zk.ssl.truststore.password > These parameters along with the component-specific ssl.client.enable option > (e.g. yarn.zookeeper.ssl.client.enable) should be passed to the > ZKCuratorManager to build the CuratorFramework. The ZKCuratorManager needs a > new overloaded start() method to build the encrypted communication. > * The secured ZK Client uses Netty, hence the dependency is included in the > pom.xml. Added netty-handler and netty-transport-native-epoll dependency to > the pom.xml based on ZOOKEEPER-3494 - "No need to depend on netty-all (SSL)". > * The change was exclusively tested with the unit test, which is a kind of > integration test, as a ZK Server was brought up and the communication tested > between the client and the server. > * This code change is in the common code base and there is no component > calling it yet. Once YARN-11468 - "Zookeeper SSL/TLS support" is implemented, > we can test it in a real cluster environment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18709) Add curator based ZooKeeper communication support over SSL/TLS into the common library
[ https://issues.apache.org/jira/browse/HADOOP-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-18709: Status: Patch Available (was: Open) > Add curator based ZooKeeper communication support over SSL/TLS into the > common library > -- > > Key: HADOOP-18709 > URL: https://issues.apache.org/jira/browse/HADOOP-18709 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Ferenc Erdelyi >Assignee: Ferenc Erdelyi >Priority: Major > Labels: pull-request-available > > With HADOOP-16579 the ZooKeeper client is capable of securing communication > with SSL. > To follow the convention introduced in HADOOP-14741, proposing to add to the > core-default.xml the following configurations, as the groundwork for the > components to enable encrypted communication between the individual > components and ZooKeeper: > * hadoop.zk.ssl.keystore.location > * hadoop.zk.ssl.keystore.password > * hadoop.zk.ssl.truststore.location > * hadoop.zk.ssl.truststore.password > These parameters along with the component-specific ssl.client.enable option > (e.g. yarn.zookeeper.ssl.client.enable) should be passed to the > ZKCuratorManager to build the CuratorFramework. The ZKCuratorManager needs a > new overloaded start() method to build the encrypted communication. > * The secured ZK Client uses Netty, hence the dependency is included in the > pom.xml. Added netty-handler and netty-transport-native-epoll dependency to > the pom.xml based on ZOOKEEPER-3494 - "No need to depend on netty-all (SSL)". > * The change was exclusively tested with the unit test, which is a kind of > integration test, as a ZK Server was brought up and the communication tested > between the client and the server. > * This code change is in the common code base and there is no component > calling it yet. Once YARN-11468 - "Zookeeper SSL/TLS support" is implemented, > we can test it in a real cluster environment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18732) Exclude Jettison from jersey-json artifact in hadoop-yarn-common's pom.xml
[ https://issues.apache.org/jira/browse/HADOOP-18732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-18732: Summary: Exclude Jettison from jersey-json artifact in hadoop-yarn-common's pom.xml (was: Exclude Jettison from jersery-json artifact in hadoop-yarn-common's pom.xml) > Exclude Jettison from jersey-json artifact in hadoop-yarn-common's pom.xml > -- > > Key: HADOOP-18732 > URL: https://issues.apache.org/jira/browse/HADOOP-18732 > Project: Hadoop Common > Issue Type: Task > Components: build >Reporter: Devaspati Krishnatri >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18602) Remove netty3 dependency
[ https://issues.apache.org/jira/browse/HADOOP-18602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-18602: Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > Remove netty3 dependency > > > Key: HADOOP-18602 > URL: https://issues.apache.org/jira/browse/HADOOP-18602 > Project: Hadoop Common > Issue Type: Task > Components: build >Affects Versions: 3.4.0 >Reporter: Tamas Domok >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > AFAIK netty3 is no longer in use so it can be removed from the dependencies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18602) Remove netty3 dependency
[ https://issues.apache.org/jira/browse/HADOOP-18602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-18602: Status: Patch Available (was: Open) > Remove netty3 dependency > > > Key: HADOOP-18602 > URL: https://issues.apache.org/jira/browse/HADOOP-18602 > Project: Hadoop Common > Issue Type: Task > Components: build >Affects Versions: 3.4.0 >Reporter: Tamas Domok >Priority: Major > Labels: pull-request-available > > AFAIK netty3 is no longer in use so it can be removed from the dependencies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631029#comment-17631029 ] Szilard Nemeth edited comment on HADOOP-15327 at 11/9/22 12:07 PM: --- Hi, CC: [~gandras], [~shuzirra], [~weichiu] Let me summarize what kind of testing I performed to make sure this change won't cause any regression. The project that helped me very much with the testing is called [Hades|https://github.com/9uapaw/hades]. Kudos to [~gandras] for the initial work on the Hades project. h1. TL;DR *Hades was the framework I used to run my testcases.* *All testcases are passed both with the trunk version of Hadoop (this is not surprising at all) and the deployed Hadoop version with my Netty upgrade patch.* *See the attached test logs for details.* [^hades-results-20221108.zip] *Also see the details below about what Hades is, how I tested, why I chose certain configurations for the testcases and many more..* *Now I'm pretty confident that this patch won't break anything so I'm waiting for reviewers.* h1. HADES IN GENERAL h2. What is Hades? Hades is a CLI tool, that shares a common interface between various Hadoop distributions. It is a collection of commands most frequently used by developers of Hadoop components. Hades supports [Hadock|https://github.com/9uapaw/docker-hadoop-dev], [Cloduera Data Platform|https://www.cloudera.com/products/cloudera-data-platform.html] and standard upstream distribution. h2. Basic features of Hades - Discover cluster: Stores where individual YARN / HDFS daemons are running. - Distribute files on certain nodes - Get config: Prints configuration of selected roles - Read logs of Hadoop roles - Restart: Restarting of certain roles - Run an application on the defined cluster - Status: Prints the status of the cluster - Update config: Update properties on a config file for selected roles - YARN specific commands - Run script: Runs user-defined custom scripts against the cluster. h1. CLUSTER + HADES SETUP h2. Run Hades with the Netty testing script against a cluster First of all, I created a standard cluster and deployed Hadoop to the cluster. Side note: Later on, all the installation that deploys Hadoop on the cluster could be part of Hades as well. It's worth to be mentioned that I have a [PR with netty-related changes|https://github.com/9uapaw/hades/pull/6] against the Hades repo. The branch of this PR is [this|https://github.com/szilard-nemeth/hades/tree/netty4-finish]. [Here are the instructions|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/README.md#set-up-hades-on-a-cluster-and-run-the-netty-script] for how to set up and run Hades with the Netty testing script. h1. THE NETTY TESTING SCRIPT The Netty testing script [lives here|https://github.com/szilard-nemeth/hades/blob/netty4-finish/script/netty4.py]. As you can see on the code, quite a lot of work has been done to make sure the Netty 4 upgrade won't break anything and won't cause any regression as it is a crucial part of MapReduce. h2. CONCEPTS h3. Test context Class: Netty4TestContext The test context provides a way to encapsulate a base branch and a patch file (if any) applied on top of the base branch. The context can enable or disable Maven compilation. The context can also have certain ways to ensure that the compilation and the deployment of new jars were successful on the cluster. Now, it can verify that certain logs are appearing in the daemon logs, making sure the deployment was okay. The main purpose of the context is to compare it with results of other contexts. For the Netty testing, it was evident that I need to make sure the trunk version and my version with the patch applied on top of trnuk works the same, e.g. there's no regression. For this, I created the context. h3. Testcase Class: Netty4Testcase In general, a testcase can have a name, a simple name, some config changes (dictionary of string keys, string values) and one MR application. h3. Test config: Config options for running the tests Class: Netty4TestConfig These are the main config options for the Netty testing. I won't go into too much details as I defined a ton of options along the way. You can check all the config options [here|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/script/netty4.py#L655-L687] h3. Compiler As mentioned above, Hades can compile Hadoop with Maven and replace the changed jars / Maven modules on the cluster. This is particularly useful for the Netty testing as I was interested in whether the patch causes any issues so I had to compile Hadoop with my Netty patch, deploy the jars on the cluster and run all the tests and see all of them passing. h2. TESTCASES The testcases are defined with the help of the Netty4TestcasesBuilder. You can find all the testcases
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631029#comment-17631029 ] Szilard Nemeth commented on HADOOP-15327: - Hi, CC: [~gandras], [~shuzirra], [~weichiu] Let me summarize what kind of testing I performed to make sure this change won't cause any regression. The project that helped me very much with the testing is called [Hades|https://github.com/9uapaw/hades]. Kudos to [~gandras] for the initial work on the Hades project. h1. TL;DR *Hades was the framework I used to run my testcases.* *All testcases are passed both with the trunk version of Hadoop (this is not surprising at all) and the deployed Hadoop version with my Netty upgrade patch.* *See the attached test logs for details.* *Also see the details below about what Hades is, how I tested, why I chose certain configurations for the testcases and many more..* *Now I'm pretty confident that this patch won't break anything so I'm waiting for reviewers.* h1. HADES IN GENERAL h2. What is Hades? Hades is a CLI tool, that shares a common interface between various Hadoop distributions. It is a collection of commands most frequently used by developers of Hadoop components. Hades supports [Hadock|https://github.com/9uapaw/docker-hadoop-dev], [Cloduera Data Platform|https://www.cloudera.com/products/cloudera-data-platform.html] and standard upstream distribution. h2. Basic features of Hades - Discover cluster: Stores where individual YARN / HDFS daemons are running. - Distribute files on certain nodes - Get config: Prints configuration of selected roles - Read logs of Hadoop roles - Restart: Restarting of certain roles - Run an application on the defined cluster - Status: Prints the status of the cluster - Update config: Update properties on a config file for selected roles - YARN specific commands - Run script: Runs user-defined custom scripts against the cluster. h1. CLUSTER + HADES SETUP h2. Run Hades with the Netty testing script against a cluster First of all, I created a standard cluster and deployed Hadoop to the cluster. Side note: Later on, all the installation that deploys Hadoop on the cluster could be part of Hades as well. It's worth to be mentioned that I have a [PR with netty-related changes|https://github.com/9uapaw/hades/pull/6] against the Hades repo. The branch of this PR is [this|https://github.com/szilard-nemeth/hades/tree/netty4-finish]. [Here are the instructions|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/README.md#set-up-hades-on-a-cluster-and-run-the-netty-script] for how to set up and run Hades with the Netty testing script. h1. THE NETTY TESTING SCRIPT The Netty testing script [lives here|https://github.com/szilard-nemeth/hades/blob/netty4-finish/script/netty4.py]. As you can see on the code, quite a lot of work has been done to make sure the Netty 4 upgrade won't break anything and won't cause any regression as it is a crucial part of MapReduce. h2. CONCEPTS h3. Test context Class: Netty4TestContext The test context provides a way to encapsulate a base branch and a patch file (if any) applied on top of the base branch. The context can enable or disable Maven compilation. The context can also have certain ways to ensure that the compilation and the deployment of new jars were successful on the cluster. Now, it can verify that certain logs are appearing in the daemon logs, making sure the deployment was okay. The main purpose of the context is to compare it with results of other contexts. For the Netty testing, it was evident that I need to make sure the trunk version and my version with the patch applied on top of trnuk works the same, e.g. there's no regression. For this, I created the context. h3. Testcase Class: Netty4Testcase In general, a testcase can have a name, a simple name, some config changes (dictionary of string keys, string values) and one MR application. h3. Test config: Config options for running the tests Class: Netty4TestConfig These are the main config options for the Netty testing. I won't go into too much details as I defined a ton of options along the way. You can check all the config options [here|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/script/netty4.py#L655-L687] h3. Compiler As mentioned above, Hades can compile Hadoop with Maven and replace the changed jars / Maven modules on the cluster. This is particularly useful for the Netty testing as I was interested in whether the patch causes any issues so I had to compile Hadoop with my Netty patch, deploy the jars on the cluster and run all the tests and see all of them passing. h2. TESTCASES The testcases are defined with the help of the Netty4TestcasesBuilder. You can find all the testcases
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631029#comment-17631029 ] Szilard Nemeth edited comment on HADOOP-15327 at 11/9/22 12:05 PM: --- Hi, CC: [~gandras], [~shuzirra], [~weichiu] Let me summarize what kind of testing I performed to make sure this change won't cause any regression. The project that helped me very much with the testing is called [Hades|https://github.com/9uapaw/hades]. Kudos to [~gandras] for the initial work on the Hades project. h1. TL;DR *Hades was the framework I used to run my testcases.* *All testcases are passed both with the trunk version of Hadoop (this is not surprising at all) and the deployed Hadoop version with my Netty upgrade patch.* *See the attached test logs for details.* *Also see the details below about what Hades is, how I tested, why I chose certain configurations for the testcases and many more..* *Now I'm pretty confident that this patch won't break anything so I'm waiting for reviewers.* h1. HADES IN GENERAL h2. What is Hades? Hades is a CLI tool, that shares a common interface between various Hadoop distributions. It is a collection of commands most frequently used by developers of Hadoop components. Hades supports [Hadock|https://github.com/9uapaw/docker-hadoop-dev], [Cloduera Data Platform|https://www.cloudera.com/products/cloudera-data-platform.html] and standard upstream distribution. h2. Basic features of Hades - Discover cluster: Stores where individual YARN / HDFS daemons are running. - Distribute files on certain nodes - Get config: Prints configuration of selected roles - Read logs of Hadoop roles - Restart: Restarting of certain roles - Run an application on the defined cluster - Status: Prints the status of the cluster - Update config: Update properties on a config file for selected roles - YARN specific commands - Run script: Runs user-defined custom scripts against the cluster. h1. CLUSTER + HADES SETUP h2. Run Hades with the Netty testing script against a cluster First of all, I created a standard cluster and deployed Hadoop to the cluster. Side note: Later on, all the installation that deploys Hadoop on the cluster could be part of Hades as well. It's worth to be mentioned that I have a [PR with netty-related changes|https://github.com/9uapaw/hades/pull/6] against the Hades repo. The branch of this PR is [this|https://github.com/szilard-nemeth/hades/tree/netty4-finish]. [Here are the instructions|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/README.md#set-up-hades-on-a-cluster-and-run-the-netty-script] for how to set up and run Hades with the Netty testing script. h1. THE NETTY TESTING SCRIPT The Netty testing script [lives here|https://github.com/szilard-nemeth/hades/blob/netty4-finish/script/netty4.py]. As you can see on the code, quite a lot of work has been done to make sure the Netty 4 upgrade won't break anything and won't cause any regression as it is a crucial part of MapReduce. h2. CONCEPTS h3. Test context Class: Netty4TestContext The test context provides a way to encapsulate a base branch and a patch file (if any) applied on top of the base branch. The context can enable or disable Maven compilation. The context can also have certain ways to ensure that the compilation and the deployment of new jars were successful on the cluster. Now, it can verify that certain logs are appearing in the daemon logs, making sure the deployment was okay. The main purpose of the context is to compare it with results of other contexts. For the Netty testing, it was evident that I need to make sure the trunk version and my version with the patch applied on top of trnuk works the same, e.g. there's no regression. For this, I created the context. h3. Testcase Class: Netty4Testcase In general, a testcase can have a name, a simple name, some config changes (dictionary of string keys, string values) and one MR application. h3. Test config: Config options for running the tests Class: Netty4TestConfig These are the main config options for the Netty testing. I won't go into too much details as I defined a ton of options along the way. You can check all the config options [here|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/script/netty4.py#L655-L687] h3. Compiler As mentioned above, Hades can compile Hadoop with Maven and replace the changed jars / Maven modules on the cluster. This is particularly useful for the Netty testing as I was interested in whether the patch causes any issues so I had to compile Hadoop with my Netty patch, deploy the jars on the cluster and run all the tests and see all of them passing. h2. TESTCASES The testcases are defined with the help of the Netty4TestcasesBuilder. You can find all the testcases
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: hades-results-20221108.zip > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, > HADOOP-15327.005.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, > hades-results-20221108.zip, testfailure-testMapFileAccess-emptyresponse.zip, > testfailure-testReduceFromPartialMem.zip > > Time Spent: 11.5h > Remaining Estimate: 0h > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18229) Fix Hadoop Common Java Doc Error
[ https://issues.apache.org/jira/browse/HADOOP-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-18229: Description: I found that when hadoop-multibranch compiled PR-4266, some errors would pop up, I tried to solve it The wrong compilation information is as follows, I try to fix the Error information {code:java} [ERROR] /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:432: error: exception not thrown: java.io.IOException [ERROR]* @throws IOException [ERROR] ^ [ERROR] /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885: error: unknown tag: username [ERROR]* E.g. link: ^/user/(?\\w+) => s3://$user.apache.com/_${user} [ERROR] ^ [ERROR] /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885: error: bad use of '>' [ERROR]* E.g. link: ^/user/(?\\w+) => s3://$user.apache.com/_${user} [ERROR]^ [ERROR] /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:910: error: unknown tag: username [ERROR]* .linkRegex.replaceresolveddstpath:_:-#.^/user/(?\w+) {code} was: I found that when hadoop-multibranch compiled PR-4266, some errors would pop up, I tried to solve it The wrong compilation information is as follows, I try to fix the Error information [ERROR] /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:432: error: exception not thrown: java.io.IOException [ERROR]* @throws IOException [ERROR] ^ [ERROR] /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885: error: unknown tag: username [ERROR]* E.g. link: ^/user/(?\\w+) => s3://$user.apache.com/_${user} [ERROR] ^ [ERROR] /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885: error: bad use of '>' [ERROR]* E.g. link: ^/user/(?\\w+) => s3://$user.apache.com/_${user} [ERROR]^ [ERROR] /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:910: error: unknown tag: username [ERROR]* .linkRegex.replaceresolveddstpath:_:-#.^/user/(?\w+) > Fix Hadoop Common Java Doc Error > > > Key: HADOOP-18229 > URL: https://issues.apache.org/jira/browse/HADOOP-18229 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > I found that when hadoop-multibranch compiled PR-4266, some errors would pop > up, I tried to solve it > The wrong compilation information is as follows, I try to fix the Error > information > {code:java} > [ERROR] > /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:432: > error: exception not thrown: java.io.IOException > [ERROR]* @throws IOException > [ERROR] ^ > [ERROR] > /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885: > error: unknown tag: username > [ERROR]* E.g. link: ^/user/(?\\w+) => > s3://$user.apache.com/_${user} > [ERROR] ^ > [ERROR] > /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885: > error: bad use of '>' > [ERROR]* E.g. link: ^/user/(?\\w+) => > s3://$user.apache.com/_${user} > [ERROR]^ > [ERROR] > /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:910: >
[jira] [Updated] (HADOOP-18222) Prevent DelegationTokenSecretManagerMetrics from registering multiple times
[ https://issues.apache.org/jira/browse/HADOOP-18222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-18222: Status: Patch Available (was: Open) > Prevent DelegationTokenSecretManagerMetrics from registering multiple times > > > Key: HADOOP-18222 > URL: https://issues.apache.org/jira/browse/HADOOP-18222 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > After committing HADOOP-18167, we received reports of the following error > when ResourceManager is initialized: > {noformat} > Caused by: java.io.IOException: Problem starting http server > at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1389) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:475) > ... 4 more > Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source > DelegationTokenSecretManagerMetrics already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.metrics2.MetricsSystem.register(MetricsSystem.java:71) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$DelegationTokenSecretManagerMetrics.create(AbstractDelegationTokenSecretManager.java:878) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.(AbstractDelegationTokenSecretManager.java:152) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$DelegationTokenSecretManager.(DelegationTokenManager.java:72) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.(DelegationTokenManager.java:122) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.initTokenManager(DelegationTokenAuthenticationHandler.java:161) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.init(DelegationTokenAuthenticationHandler.java:130) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.initializeAuthHandler(DelegationTokenAuthenticationFilter.java:214) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:180) > at > org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.init(RMAuthenticationFilter.java:53){noformat} > This can happen if MetricsSystemImpl#init is called and multiple metrics are > registered with the same name. A proposed solution is to declare the metrics > in AbstractDelegationTokenSecretManager as singleton, which would prevent > multiple instances DelegationTokenSecretManagerMetrics from being registered. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18222) Prevent DelegationTokenSecretManagerMetrics from registering multiple times
[ https://issues.apache.org/jira/browse/HADOOP-18222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned HADOOP-18222: --- Assignee: Hector Sandoval Chaverri > Prevent DelegationTokenSecretManagerMetrics from registering multiple times > > > Key: HADOOP-18222 > URL: https://issues.apache.org/jira/browse/HADOOP-18222 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Hector Sandoval Chaverri >Assignee: Hector Sandoval Chaverri >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > After committing HADOOP-18167, we received reports of the following error > when ResourceManager is initialized: > {noformat} > Caused by: java.io.IOException: Problem starting http server > at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1389) > at > org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:475) > ... 4 more > Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source > DelegationTokenSecretManagerMetrics already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.metrics2.MetricsSystem.register(MetricsSystem.java:71) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$DelegationTokenSecretManagerMetrics.create(AbstractDelegationTokenSecretManager.java:878) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.(AbstractDelegationTokenSecretManager.java:152) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$DelegationTokenSecretManager.(DelegationTokenManager.java:72) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.(DelegationTokenManager.java:122) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.initTokenManager(DelegationTokenAuthenticationHandler.java:161) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.init(DelegationTokenAuthenticationHandler.java:130) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.initializeAuthHandler(DelegationTokenAuthenticationFilter.java:214) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:180) > at > org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.init(RMAuthenticationFilter.java:53){noformat} > This can happen if MetricsSystemImpl#init is called and multiple metrics are > registered with the same name. A proposed solution is to declare the metrics > in AbstractDelegationTokenSecretManager as singleton, which would prevent > multiple instances DelegationTokenSecretManagerMetrics from being registered. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430075#comment-17430075 ] Szilard Nemeth edited comment on HADOOP-15327 at 10/18/21, 3:59 PM: Hi [~zhenshan.wen] , I'm planning to fix the maven shading issue in the coming weeks, as soon as possible. Also, I'd appreciate if you could help me to find out what to fix to get rid of the Maven shading issue. Same thing I asked for here: [https://github.com/apache/hadoop/pull/3259#issuecomment-945923248] was (Author: snemeth): Hi [~zhenshan.wen] , I'm planning to fix the maven shading issue in the coming weeks, as soon as possible. Also, I'd appreciate if you could help me to find out what to fix to get rid of the Maven shading issue. Same thing I asked here: https://github.com/apache/hadoop/pull/3259#issuecomment-945923248 > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, > HADOOP-15327.005.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, > testfailure-testMapFileAccess-emptyresponse.zip, > testfailure-testReduceFromPartialMem.zip > > Time Spent: 6h > Remaining Estimate: 0h > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430075#comment-17430075 ] Szilard Nemeth edited comment on HADOOP-15327 at 10/18/21, 3:59 PM: Hi [~zhenshan.wen] , I'm planning to fix the maven shading issue in the coming weeks, as soon as possible. Also, I'd appreciate if you could help me to find out what to fix to get rid of the Maven shading issue. Same thing I asked here: https://github.com/apache/hadoop/pull/3259#issuecomment-945923248 was (Author: snemeth): Hi [~zhenshan.wen] , I'm planning to fix the maven shading issue in the coming weeks, as soon as possible. > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, > HADOOP-15327.005.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, > testfailure-testMapFileAccess-emptyresponse.zip, > testfailure-testReduceFromPartialMem.zip > > Time Spent: 6h > Remaining Estimate: 0h > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430075#comment-17430075 ] Szilard Nemeth commented on HADOOP-15327: - Hi [~zhenshan.wen] , I'm planning to fix the maven shading issue in the coming weeks, as soon as possible. > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, > HADOOP-15327.005.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, > testfailure-testMapFileAccess-emptyresponse.zip, > testfailure-testReduceFromPartialMem.zip > > Time Spent: 5h 40m > Remaining Estimate: 0h > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17919) Fix command line example in Hadoop Cluster Setup documentation
[ https://issues.apache.org/jira/browse/HADOOP-17919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-17919: Description: About Hadoop cluster setup documentation ([https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html]) The option is specified in the following example, but HDFS command ignores it. {noformat} `[hdfs]$ $HADOOP_HOME/bin/hdfs namenode -format ` {noformat} was: About Hdoop cluster setup documentation ([https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html]) The option is specified in the following example, but HDFS command ignores it. {noformat} `[hdfs]$ $HADOOP_HOME/bin/hdfs namenode -format ` {noformat} > Fix command line example in Hadoop Cluster Setup documentation > -- > > Key: HADOOP-17919 > URL: https://issues.apache.org/jira/browse/HADOOP-17919 > Project: Hadoop Common > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1, 3.4.0 >Reporter: Rintaro Ikeda >Assignee: Rintaro Ikeda >Priority: Minor > Labels: docuentation, pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 40m > Remaining Estimate: 0h > > About Hadoop cluster setup documentation > ([https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html]) > The option is specified in the following example, but HDFS > command ignores it. > {noformat} > `[hdfs]$ $HADOOP_HOME/bin/hdfs namenode -format ` > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412011#comment-17412011 ] Szilard Nemeth commented on HADOOP-17857: - Hi [~epayne], Just came into my mind that there's no documentation update for this change in the commit but I already committed it. Would you mind reporting a follow-up jira for some doc changes? > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Fix For: 3.4.0 > > Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-17857: Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Fix For: 3.4.0 > > Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-17857: Fix Version/s: 3.4.0 > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Fix For: 3.4.0 > > Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs
[ https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412007#comment-17412007 ] Szilard Nemeth commented on HADOOP-17857: - Thanks [~epayne] for working on this, Just read through the description and comments, everything is clear for me and I like the simplistic way of solving this problem. It's also reassuring that you have been running with this change in production for over a year. So, latest patch looks to me and committed patch002 to trunk. Resolving this jira, if you want to backport to older branches (3.3 or even 3.2), please reopen. Thanks. > Check real user ACLs in addition to proxied user ACLs > - > > Key: HADOOP-17857 > URL: https://issues.apache.org/jira/browse/HADOOP-17857 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.2.2, 2.10.1, 3.3.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch > > > In a secure cluster, it is possible to configure the services to allow a > super-user to proxy to a regular user and perform actions on behalf of the > proxied user (see [Proxy user - Superusers Acting On Behalf Of Other > Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]). > This is useful for automating server access for multiple different users in a > multi-tenant cluster. For example, this can be used by a super user > submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie > workflows, etc, which will then execute the service as the proxied user. > Usually when these services check ACLs to determine if the user has access to > the requested resources, the service only needs to check the ACLs for the > proxied user. However, it is sometimes desirable to allow the proxied user to > have access to the resources when only the real user has open ACLs. > For instance, let's say the user {{adm}} is the only user with submit ACLs to > the {{dataload}} queue, and the {{adm}} user wants to submit apps to the > {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In > addition, we want to be able to bill {{headless1}} and {{headless2}} > separately for the YARN resources used in the {{dataload}} queue. In order to > do this, the apps need to run in the {{dataload}} queue as the respective > headless users. We could open up the ACLs to the {{dataload}} queue to allow > {{headless1}} and {{headless2}} to submit apps. But this would allow those > users to submit any app to that queue, and not be limited to just the data > loading apps, and we don't trust the {{headless1}} and {{headless2}} owners > to honor that restriction. > This JIRA proposes that we define a way to set up ACLs to restrict a > resource's access to a super-user, but when the access happens, run it as > the proxied user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393023#comment-17393023 ] Szilard Nemeth commented on HADOOP-15327: - Converted to a PR, no more patches will be uploaded to this jira. > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Labels: pull-request-available > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, > HADOOP-15327.005.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, > testfailure-testMapFileAccess-emptyresponse.zip, > testfailure-testReduceFromPartialMem.zip > > Time Spent: 20m > Remaining Estimate: 0h > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17791) TestActivitiesManager is flaky
Szilard Nemeth created HADOOP-17791: --- Summary: TestActivitiesManager is flaky Key: HADOOP-17791 URL: https://issues.apache.org/jira/browse/HADOOP-17791 Project: Hadoop Common Issue Type: Bug Reporter: Szilard Nemeth I noticed in our internal testing environment that org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.TestActivitiesManager.testAppActivitiesTTL failed a couple of times, quite randomly. By checking the Jira and searching for the name of the class, there are some results from this year as well: [https://issues.apache.org/jira/issues/?jql=text%20~%20TestActivitiesManager%20ORDER%20BY%20updated%20DESC] I don't know exactly how to reproduce this though. Tries to run the whole test class 60 times and it hasn't failed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: HADOOP-15327.005.patch > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, > HADOOP-15327.005.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, > testfailure-testMapFileAccess-emptyresponse.zip, > testfailure-testReduceFromPartialMem.zip > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:23 PM: --- Just uploaded a new patch: [^HADOOP-15327.005.patch] I have been (almost) exclusively working on this since my last comment (https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367) and there are a couple of things to add again. The last commit that was discussed is this: [https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62] Let me explain what've changed commit by commit. I will skip a bunch of trivial ones like code cleanup, added comments and the like. *I will cover the test failures surfaced by Jenkins build / unit test results:* - Build #1: https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456 - Build #2: https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928 *1. TestShuffleHandler: Introduced InputStreamReadResult that stores response as string + total bytes read: [https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]* Here, I added a new class called 'InputStreamReadResult' that stores the bytes read (byte[]) and the number of bytes read from a response InputStream. This improves the way testcases can assert on these data. *2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: [https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]* It was a common pitfall while debugging that the tests had to modified to use a certain fixed port. Here, I added a constant to store the port number so when I had to debug I only needed to change it in one single place. *3. Create class: TestExecution: Configure proxy, keep alive connection timeout: [https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]* In order to debug the HTTP responses, I found it convenient to add a helper class that is responsible for the following: - Configuring the HTTP connections, use a proxy when required - Increase the keepalive timeout when using DEBUG mode TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit test setup method. There are 2 flags that control the behaviour of this object: {code:java} //Control test execution properties with these flags private static final boolean DEBUG_MODE = true; //If this is set to true and proxy server is not running, tests will fail! private static final boolean USE_PROXY = false; {code} The only difference on top of these is in the code of testcases: They create all HTTP connections with: {code:java} TEST_EXECUTION.openConnection(url) {code} *4. TestExecution: Configure port: [https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]* One addition to 3. is to include the port used by ShuffleHandler in the TestExecution object. When using DEBUG mode, the port is fixed to a value, otherwise it is set to 0, meaning that the port will be dynamically chosen. *5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: [https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]* While debugging TestShuffleHandler#testMapFileAccess, just realized that I forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial way was to modify the pipeline when the channel is activated. *6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + reproduce jenkins UT failure: [https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]* Here's where the fun begins. The problem with TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module: {code:java} // This will run only in NativeIO is enabled as SecureIOUtils need it assumeTrue(NativeIO.isAvailable()); {code} I tried to compile the Hadoop Native libraries on my Mac according to these resources: - Native libraries: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html] - Followed this guide: [https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee] Unfortunately, I still had compilation errors so I eventually gave up and tweaked the test to be able to run it locally. This wasn't such a complex thing, I don't think it's worth to go into the details, had to comment out some test code that used the Native library and that was all. From the Jenkins results I had this: {code:java} [INFO] ---
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:18 PM: --- Just uploaded a new patch: [^HADOOP-15327.005.patch] I have been (almost) exclusively working on this since my last comment (https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367) and there are a couple of things to add again. The last commit that was discussed is this: [https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62] Let me explain what've changed commit by commit. I will skip a bunch of trivial ones like code cleanup, added comments and the like. *I will cover the test failures surfaced by Jenkins build / unit test results:* - Build #1: https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456 - Build #2: https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928 *1. TestShuffleHandler: Introduced InputStreamReadResult that stores response as string + total bytes read: [https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]* Here, I added a new class called 'InputStreamReadResult' that stores the bytes read (byte[]) and the number of bytes read from a response InputStream. This improves the way testcases can assert on these data. *2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: [https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]* It was a common pitfall while debugging that the tests had to modified to use a certain fixed port. Here, I added a constant to store the port number so when I had to debug I only needed to change it in one single place. *3. Create class: TestExecution: Configure proxy, keep alive connection timeout: [https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]* In order to debug the HTTP responses, I found it convenient to add a helper class that is responsible for the following: - Configuring the HTTP connections, use a proxy when required - Increase the keepalive timeout when using DEBUG mode TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit test setup method. There are 2 flags that control the behaviour of this object: {code:java} //Control test execution properties with these flags private static final boolean DEBUG_MODE = true; //If this is set to true and proxy server is not running, tests will fail! private static final boolean USE_PROXY = false; {code} The only difference on top of these is in the code of testcases: They create all HTTP connections with: {code:java} TEST_EXECUTION.openConnection(url) {code} *4. TestExecution: Configure port: [https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]* One addition to 3. is to include the port used by ShuffleHandler in the TestExecution object. When using DEBUG mode, the port is fixed to a value, otherwise it is set to 0, meaning that the port will be dynamically chosen. *5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: [https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]* While debugging TestShuffleHandler#testMapFileAccess, just realized that I forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial way was to modify the pipeline when the channel is activated. *6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + reproduce jenkins UT failure: [https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]* Here's where the fun begins. The problem with TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module: {code:java} // This will run only in NativeIO is enabled as SecureIOUtils need it assumeTrue(NativeIO.isAvailable()); {code} I tried to compile the Hadoop Native libraries on my Mac according to these resources: - Native libraries: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html] - Followed this guide: [https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee] Unfortunately, I still had compilation errors so I eventually gave up and tweaked the test to be able to run it locally. This wasn't such a complex thing, I don't think it's worth to go into the details, had to comment out some test code that used the Native library and that was all. From the Jenkins results I had this: {code:java} [INFO] ---
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:17 PM: --- Just uploaded a new patch: [^HADOOP-15327.005.patch] I have been (almost) exclusively working on this since my last comment (https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367) and there are a couple of things to add again. The last commit that was discussed is this: [https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62] Let me explain what've changed commit by commit. I will skip a bunch of trivial ones like code cleanup, added comments and the like. *I will cover the test failures surfaced by Jenkins build / unit test results:* - Build #1: https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456 - Build #2: https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928 *1. TestShuffleHandler: Introduced InputStreamReadResult that stores response as string + total bytes read: [https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]* Here, I added a new class called 'InputStreamReadResult' that stores the bytes read (byte[]) and the number of bytes read from a response InputStream. This improves the way testcases can assert on these data. *2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: [https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]* It was a common pitfall while debugging that the tests had to modified to use a certain fixed port. Here, I added a constant to store the port number so when I had to debug I only needed to change it in one single place. *3. Create class: TestExecution: Configure proxy, keep alive connection timeout: [https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]* In order to debug the HTTP responses, I found it convenient to add a helper class that is responsible for the following: - Configuring the HTTP connections, use a proxy when required - Increase the keepalive timeout when using DEBUG mode TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit test setup method. There are 2 flags that control the behaviour of this object: {code:java} //Control test execution properties with these flags private static final boolean DEBUG_MODE = true; //If this is set to true and proxy server is not running, tests will fail! private static final boolean USE_PROXY = false; {code} The only difference on top of these is in the code of testcases: They create all HTTP connections with: {code:java} TEST_EXECUTION.openConnection(url) {code} *4. TestExecution: Configure port: [https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]* One addition to 3. is to include the port used by ShuffleHandler in the TestExecution object. When using DEBUG mode, the port is fixed to a value, otherwise it is set to 0, meaning that the port will be dynamically chosen. *5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: [https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]* While debugging TestShuffleHandler#testMapFileAccess, just realized that I forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial way was to modify the pipeline when the channel is activated. *6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + reproduce jenkins UT failure: [https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]* Here's where the fun begins. The problem with TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module: {code:java} // This will run only in NativeIO is enabled as SecureIOUtils need it assumeTrue(NativeIO.isAvailable()); {code} I tried to compile the Hadoop Native libraries on my Mac according to these resources: - Native libraries: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html] - Followed this guide: [https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee] Unfortunately, I still had compilation errors so I eventually gave up and tweaked the test to be able to run it locally. This wasn't such a complex thing, I don't think it's worth to go into the details, had to comment out some test code that used the Native library and that was all. From the Jenkins results I had this: {code:java} [INFO] ---
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:16 PM: --- Just uploaded a new patch: [^HADOOP-15327.005.patch] I have been (almost) exclusively working on this since my last comment and there are a couple of things to add again. The last commit that was discussed is this: [https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62] Let me explain what've changed commit by commit. I will skip a bunch of trivial ones like code cleanup, added comments and the like. *I will cover the test failures surfaced by Jenkins build / unit test results:* - Build #1 - Build #2 *1. TestShuffleHandler: Introduced InputStreamReadResult that stores response as string + total bytes read: [https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]* Here, I added a new class called 'InputStreamReadResult' that stores the bytes read (byte[]) and the number of bytes read from a response InputStream. This improves the way testcases can assert on these data. *2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: [https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]* It was a common pitfall while debugging that the tests had to modified to use a certain fixed port. Here, I added a constant to store the port number so when I had to debug I only needed to change it in one single place. *3. Create class: TestExecution: Configure proxy, keep alive connection timeout: [https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]* In order to debug the HTTP responses, I found it convenient to add a helper class that is responsible for the following: - Configuring the HTTP connections, use a proxy when required - Increase the keepalive timeout when using DEBUG mode TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit test setup method. There are 2 flags that control the behaviour of this object: {code:java} //Control test execution properties with these flags private static final boolean DEBUG_MODE = true; //If this is set to true and proxy server is not running, tests will fail! private static final boolean USE_PROXY = false; {code} The only difference on top of these is in the code of testcases: They create all HTTP connections with: {code:java} TEST_EXECUTION.openConnection(url) {code} *4. TestExecution: Configure port: [https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]* One addition to 3. is to include the port used by ShuffleHandler in the TestExecution object. When using DEBUG mode, the port is fixed to a value, otherwise it is set to 0, meaning that the port will be dynamically chosen. *5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: [https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]* While debugging TestShuffleHandler#testMapFileAccess, just realized that I forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial way was to modify the pipeline when the channel is activated. *6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + reproduce jenkins UT failure: [https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]* Here's where the fun begins. The problem with TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module: {code:java} // This will run only in NativeIO is enabled as SecureIOUtils need it assumeTrue(NativeIO.isAvailable()); {code} I tried to compile the Hadoop Native libraries on my Mac according to these resources: - Native libraries: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html] - Followed this guide: [https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee] Unfortunately, I still had compilation errors so I eventually gave up and tweaked the test to be able to run it locally. This wasn't such a complex thing, I don't think it's worth to go into the details, had to comment out some test code that used the Native library and that was all. From the Jenkins results I had this: {code:java} [INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ hadoop-mapreduce-client-shuffle --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.mapred.TestFadvisedFileRegion [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.493 s - in org.apache.hadoop.mapred.TestFadvisedFileRegion [INFO] Running org.apache.hadoop.mapred.TestShuffleHandler
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:15 PM: --- Just uploaded a new patch: [^HADOOP-15327.005.patch] I have been (almost) exclusively working on this since [my last comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367] and there are a couple of things to add again. The last commit that was discussed is this: [https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62] Let me explain what've changed commit by commit. I will skip a bunch of trivial ones like code cleanup, added comments and the like. *I will cover the test failures surfaced by Jenkins build / unit test results:* - [Build #1|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456] - [Build #2|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928] *1. TestShuffleHandler: Introduced InputStreamReadResult that stores response as string + total bytes read: [https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]* Here, I added a new class called 'InputStreamReadResult' that stores the bytes read (byte[]) and the number of bytes read from a response InputStream. This improves the way testcases can assert on these data. *2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: [https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]* It was a common pitfall while debugging that the tests had to modified to use a certain fixed port. Here, I added a constant to store the port number so when I had to debug I only needed to change it in one single place. *3. Create class: TestExecution: Configure proxy, keep alive connection timeout: [https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]* In order to debug the HTTP responses, I found it convenient to add a helper class that is responsible for the following: - Configuring the HTTP connections, use a proxy when required - Increase the keepalive timeout when using DEBUG mode TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit test setup method. There are 2 flags that control the behaviour of this object: {code:java} //Control test execution properties with these flags private static final boolean DEBUG_MODE = true; //If this is set to true and proxy server is not running, tests will fail! private static final boolean USE_PROXY = false; {code} The only difference on top of these is in the code of testcases: They create all HTTP connections with: {code:java} TEST_EXECUTION.openConnection(url) {code} *4. TestExecution: Configure port: [https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]* One addition to 3. is to include the port used by ShuffleHandler in the TestExecution object. When using DEBUG mode, the port is fixed to a value, otherwise it is set to 0, meaning that the port will be dynamically chosen. *5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: [https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]* While debugging TestShuffleHandler#testMapFileAccess, just realized that I forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial way was to modify the pipeline when the channel is activated. *6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + reproduce jenkins UT failure: [https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]* Here's where the fun begins. The problem with TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module: {code:java} // This will run only in NativeIO is enabled as SecureIOUtils need it assumeTrue(NativeIO.isAvailable()); {code} I tried to compile the Hadoop Native libraries on my Mac according to these resources: - Native libraries: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html] - Followed this guide: [https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee] Unfortunately, I still had compilation errors so I eventually gave up and tweaked the test to be able to run it locally. This wasn't such a complex thing, I don't think it's worth to go into the details, had to comment out some test code that used the Native library and that was all. From the Jenkins results I had this: {code:java} [INFO] ---
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:14 PM: --- Just uploaded a new patch: [^HADOOP-15327.005.patch] I have been (almost) exclusively working on this since [my last comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367] and there are a couple of things to add again. The last commit that was discussed is this: [https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62] Let me explain what've changed commit by commit. I will skip a bunch of trivial ones like code cleanup, added comments and the like. *I will cover the test failures surfaced by Jenkins build / unit test results:* - [Build #1|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456] - [Build #2|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928] *1. TestShuffleHandler: Introduced InputStreamReadResult that stores response as string + total bytes read: [https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]* Here, I added a new class called 'InputStreamReadResult' that stores the bytes read (byte[]) and the number of bytes read from a response InputStream. This improves the way testcases can assert on these data. *2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: [https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]* It was a common pitfall while debugging that the tests had to modified to use a certain fixed port. Here, I added a constant to store the port number so when I had to debug I only needed to change it in one single place. *3. Create class: TestExecution: Configure proxy, keep alive connection timeout: [https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]* In order to debug the HTTP responses, I found it convenient to add a helper class that is responsible for the following: - Configuring the HTTP connections, use a proxy when required - Increase the keepalive timeout when using DEBUG mode TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit test setup method. There are 2 flags that control the behaviour of this object: {code:java} //Control test execution properties with these flags private static final boolean DEBUG_MODE = true; //If this is set to true and proxy server is not running, tests will fail! private static final boolean USE_PROXY = false; {code} The only difference on top of these is in the code of testcases: They create all HTTP connections with: {code:java} TEST_EXECUTION.openConnection(url) {code} *4. TestExecution: Configure port: [https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]* One addition to 3. is to include the port used by ShuffleHandler in the TestExecution object. When using DEBUG mode, the port is fixed to a value, otherwise it is set to 0, meaning that the port will be dynamically chosen. *5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: [https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]* While debugging TestShuffleHandler#testMapFileAccess, just realized that I forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial way was to modify the pipeline when the channel is activated. *6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + reproduce jenkins UT failure: [https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]* Here's where the fun begins. The problem with TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module: {code:java} // This will run only in NativeIO is enabled as SecureIOUtils need it assumeTrue(NativeIO.isAvailable()); {code} I tried to compile the Hadoop Native libraries on my Mac according to these resources: - Native libraries: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html] - Followed this guide: [https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee] Unfortunately, I still had compilation errors so I eventually gave up and tweaked the test to be able to run it locally. This wasn't such a complex thing, I don't think it's worth to go into the details, had to comment out some test code that used the Native library and that was all. From the Jenkins results I had this: {code:java} [INFO] ---
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:13 PM: --- Just uploaded a new patch: [^HADOOP-15327.005.patch] I have been (almost) exclusively working on this since [my last comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367] and there are a couple of things to add again. The last commit that was discussed is this: [https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62] Let me explain what've changed commit by commit. I will skip a bunch of trivial ones like code cleanup, added comments and the like. *I will cover the test failures surfaced by Jenkins build / unit test results:* - [Build #1|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456] - [Build #2|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928] *1. TestShuffleHandler: Introduced InputStreamReadResult that stores response as string + total bytes read: [https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]* Here, I added a new class called 'InputStreamReadResult' that stores the bytes read (byte[]) and the number of bytes read from a response InputStream. This improves the way testcases can assert on these data. *2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: [https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]* It was a common pitfall while debugging that the tests had to modified to use a certain fixed port. Here, I added a constant to store the port number so when I had to debug I only needed to change it in one single place. *3. Create class: TestExecution: Configure proxy, keep alive connection timeout: [https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]* In order to debug the HTTP responses, I found it convenient to add a helper class that is responsible for the following: - Configuring the HTTP connections, use a proxy when required - Increase the keepalive timeout when using DEBUG mode TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit test setup method. There are 2 flags that control the behaviour of this object: {code:java} //Control test execution properties with these flags private static final boolean DEBUG_MODE = true; //If this is set to true and proxy server is not running, tests will fail! private static final boolean USE_PROXY = false; {code} The only difference on top of these is in the code of testcases: They create all HTTP connections with: {code:java} TEST_EXECUTION.openConnection(url) {code} *4. TestExecution: Configure port: [https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]* One addition to 3. is to include the port used by ShuffleHandler in the TestExecution object. When using DEBUG mode, the port is fixed to a value, otherwise it is set to 0, meaning that the port will be dynamically chosen. *5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: [https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]* While debugging TestShuffleHandler#testMapFileAccess, just realized that I forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial way was to modify the pipeline when the channel is activated. *6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + reproduce jenkins UT failure: [https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]* Here's where the fun begins. The problem with TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module: {code:java} // This will run only in NativeIO is enabled as SecureIOUtils need it assumeTrue(NativeIO.isAvailable()); {code} I tried to compile the Hadoop Native libraries on my Mac according to these resources: - Native libraries: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html] - Followed this guide: [https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee] Unfortunately, I still had compilation errors so I eventually gave up and tweaked the test to be able to run it locally. This wasn't such a complex thing, I don't think it's worth to go into the details, had to comment out some test code that used the Native library and that was all. From the Jenkins results I had this: {code:java} [INFO] ---
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:09 PM: --- Just uploaded a new patch: [^HADOOP-15327.005.patch] I have been (almost) exclusively working on this since my last comment and there are a couple of things to add again. The last commit that was discussed is this: [https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62] Let me explain what've changed commit by commit. I will skip a bunch of trivial ones like code cleanup, added comments and the like. *I will cover the test failures surfaced by Jenkins build / unit test results:* - [Build #1|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456] - [Build #2|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928] *1. TestShuffleHandler: Introduced InputStreamReadResult that stores response as string + total bytes read: [https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]* Here, I added a new class called 'InputStreamReadResult' that stores the bytes read (byte[]) and the number of bytes read from a response InputStream. This improves the way testcases can assert on these data. *2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: [https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]* It was a common pitfall while debugging that the tests had to modified to use a certain fixed port. Here, I added a constant to store the port number so when I had to debug I only needed to change it in one single place. *3. Create class: TestExecution: Configure proxy, keep alive connection timeout: [https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]* In order to debug the HTTP responses, I found it convenient to add a helper class that is responsible for the following: - Configuring the HTTP connections, use a proxy when required - Increase the keepalive timeout when using DEBUG mode TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit test setup method. There are 2 flags that control the behaviour of this object: {code:java} //Control test execution properties with these flags private static final boolean DEBUG_MODE = true; //If this is set to true and proxy server is not running, tests will fail! private static final boolean USE_PROXY = false; {code} The only difference on top of these is in the code of testcases: They create all HTTP connections with: {code:java} TEST_EXECUTION.openConnection(url) {code} *4. TestExecution: Configure port: [https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]* One addition to 3. is to include the port used by ShuffleHandler in the TestExecution object. When using DEBUG mode, the port is fixed to a value, otherwise it is set to 0, meaning that the port will be dynamically chosen. *5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: [https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]* While debugging TestShuffleHandler#testMapFileAccess, just realized that I forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial way was to modify the pipeline when the channel is activated. *6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + reproduce jenkins UT failure: [https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]* Here's where the fun begins. The problem with TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module: {code:java} // This will run only in NativeIO is enabled as SecureIOUtils need it assumeTrue(NativeIO.isAvailable()); {code} I tried to compile the Hadoop Native libraries on my Mac according to these resources: - Native libraries: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html] - Followed this guide: [https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee] Unfortunately, I still had compilation errors so I eventually gave up and tweaked the test to be able to run it locally. This wasn't such a complex thing, I don't think it's worth to go into the details, had to comment out some test code that used the Native library and that was all. From the Jenkins results I had this: {code:java} [INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ hadoop-mapreduce-client-shuffle --- [INFO] [INFO] --- [INFO] T E
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:08 PM: --- Just uploaded a new patch: [^HADOOP-15327.005.patch] I have been (almost) exclusively working on this since my last comment and there are a couple of things to add again. The last commit that was discussed is this: [https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62] Let me explain what've changed commit by commit. I will skip a bunch of trivial ones like code cleanup, added comments and the like. *I will cover the test failures surfaced by Jenkins build / unit test results:* - Build #1 - Build #2 *1. TestShuffleHandler: Introduced InputStreamReadResult that stores response as string + total bytes read: [https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]* Here, I added a new class called 'InputStreamReadResult' that stores the bytes read (byte[]) and the number of bytes read from a response InputStream. This improves the way testcases can assert on these data. *2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: [https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]* It was a common pitfall while debugging that the tests had to modified to use a certain fixed port. Here, I added a constant to store the port number so when I had to debug I only needed to change it in one single place. *3. Create class: TestExecution: Configure proxy, keep alive connection timeout: [https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]* In order to debug the HTTP responses, I found it convenient to add a helper class that is responsible for the following: - Configuring the HTTP connections, use a proxy when required - Increase the keepalive timeout when using DEBUG mode TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit test setup method. There are 2 flags that control the behaviour of this object: {code:java} //Control test execution properties with these flags private static final boolean DEBUG_MODE = true; //If this is set to true and proxy server is not running, tests will fail! private static final boolean USE_PROXY = false; {code} The only difference on top of these is in the code of testcases: They create all HTTP connections with: {code:java} TEST_EXECUTION.openConnection(url) {code} *4. TestExecution: Configure port: [https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]* One addition to 3. is to include the port used by ShuffleHandler in the TestExecution object. When using DEBUG mode, the port is fixed to a value, otherwise it is set to 0, meaning that the port will be dynamically chosen. *5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: [https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]* While debugging TestShuffleHandler#testMapFileAccess, just realized that I forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial way was to modify the pipeline when the channel is activated. *6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + reproduce jenkins UT failure: [https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]* Here's where the fun begins. The problem with TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module: {code:java} // This will run only in NativeIO is enabled as SecureIOUtils need it assumeTrue(NativeIO.isAvailable()); {code} I tried to compile the Hadoop Native libraries on my Mac according to these resources: - Native libraries: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html] - Followed this guide: [https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee] Unfortunately, I still had compilation errors so I eventually gave up and tweaked the test to be able to run it locally. This wasn't such a complex thing, I don't think it's worth to go into the details, had to comment out some test code that used the Native library and that was all. From the Jenkins results I had this: {code:java} [INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ hadoop-mapreduce-client-shuffle --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.mapred.TestFadvisedFileRegion [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.493 s - in org.apache.hadoop.mapred.TestFadvisedFileRegion [INFO] Running org.apache.hadoop.mapred.TestShuffleHandler
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468 ] Szilard Nemeth commented on HADOOP-15327: - Just uploaded a new patch: [^HADOOP-15327.005.patch] I have been (almost) exclusively working on this since my last comment and there are a couple of things to add again. The last commit that was discussed is this: [https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62] Let me explain what've changed commit by commit. I will skip a bunch of trivial ones like code cleanup, added comments and the like. *I will cover the test failures surfaced by Jenkins build / unit test results:* - Build #1 - Build #2 *1. TestShuffleHandler: Introduced InputStreamReadResult that stores response as string + total bytes read: [https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]* Here, I added a new class called 'InputStreamReadResult' that stores the bytes read (byte[]) and the number of bytes read from a response InputStream. This improves the way testcases can assert on these data. *2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: [https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]* It was a common pitfall while debugging that the tests had to modified to use a certain fixed port. Here, I added a constant to store the port number so when I had to debug I only needed to change it in one single place. *3. Create class: TestExecution: Configure proxy, keep alive connection timeout: [https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]* In order to debug the HTTP responses, I found it convenient to add a helper class that is responsible for the following: - Configuring the HTTP connections, use a proxy when required - Increase the keepalive timeout when using DEBUG mode TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit test setup method. There are 2 flags that control the behaviour of this object: {code:java} //Control test execution properties with these flags private static final boolean DEBUG_MODE = true; //If this is set to true and proxy server is not running, tests will fail! private static final boolean USE_PROXY = false; {code} The only difference on top of these is in the code of testcases: They create all HTTP connections with: {code:java} TEST_EXECUTION.openConnection(url) {code} *4. TestExecution: Configure port: [https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]* One addition to 3. is to include the port used by ShuffleHandler in the TestExecution object. When using DEBUG mode, the port is fixed to a value, otherwise it is set to 0, meaning that the port will be dynamically chosen. *5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: [https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]* While debugging TestShuffleHandler#testMapFileAccess, just realized that I forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial way was to modify the pipeline when the channel is activated. *6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + reproduce jenkins UT failure: [https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]* Here's where the fun begins. The problem with TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module: {code:java} // This will run only in NativeIO is enabled as SecureIOUtils need it assumeTrue(NativeIO.isAvailable()); {code} I tried to compile the Hadoop Native libraries on my Mac according to these resources: - Native libraries: [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html] - Followed this guide: [https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee] Unfortunately, I still had compilation errors so I eventually gave up and tweaked the test to be able to run it locally. This wasn't such a complex thing, I don't think it's worth to go into the details, had to comment out some test code that used the Native library and that was all. From the Jenkins results I had this: {code:java} [INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ hadoop-mapreduce-client-shuffle --- [INFO] [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.mapred.TestFadvisedFileRegion [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.493 s - in org.apache.hadoop.mapred.TestFadvisedFileRegion [INFO] Running org.apache.hadoop.mapred.TestShuffleHandler [ERROR] Tests run: 15, Failures: 1, Errors: 0,
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: testfailure-testReduceFromPartialMem.zip testfailure-testMapFileAccess-emptyresponse.zip > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, > testfailure-testMapFileAccess-emptyresponse.zip, > testfailure-testReduceFromPartialMem.zip > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: HADOOP-15327.005.patch > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367912#comment-17367912 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/24/21, 7:52 PM: --- Hey [~weichiu], Thanks for putting the excerpt here. This could be fixed in parallel, I would be glad if you could point me to the config that needs to be changed. Currently, I'm working on the test issues produced by the build that ran against patch003: hadoop.mapred.TestReduceFetchFromPartialMem hadoop.mapred.TestReduceFetch There are jiras related to these tests but checked the logs and saw very suspicious things and it pointed me to a code defect. I will upload a next patch soon along with explanation of what has been changed since patch004. Hopefully, this can be the last one and I can finally start testing on a cluster. Will also make sure of creating proper manual testing documentation + collecting the test evidence. I wouldn't expect any production issues (fingers crossed) as test coverage is quite good and while I have been fixing the tests, I gained a lot of code knowledge, almost being familiar with the ShuffleHandler inside and out. was (Author: snemeth): Hey [~weichiu], Thanks for putting the exceprt here. This could be fixed in parallel, I would be glad if you could point me to the config that needs to be changed. Currently, I'm working on the test issues produced by the build that ran against patch003: hadoop.mapred.TestReduceFetchFromPartialMem hadoop.mapred.TestReduceFetch There are jiras related to these tests but checked the logs and saw very suspicious things and it pointed me to a code defect. I will upload a next patch soon along with explanation of what has been changed since patch004. Hopefully, this can be the last one and I can finally start testing on a cluster. Will also make sure of creating proper manual testing documentation + collecting the test evidence. I wouldn't expect any production issues (fingers crossed) as test coverage is quite good and while I have been fixing the tests, I gained a lot of code knowledge, almost being familiar with the ShuffleHandler inside and out. > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367912#comment-17367912 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/23/21, 7:00 AM: --- Hey [~weichiu], Thanks for putting the exceprt here. This could be fixed in parallel, I would be glad if you could point me to the config that needs to be changed. Currently, I'm working on the test issues produced by the build that ran against patch003: hadoop.mapred.TestReduceFetchFromPartialMem hadoop.mapred.TestReduceFetch There are jiras related to these tests but checked the logs and saw very suspicious things and it pointed me to a code defect. I will upload a next patch soon along with explanation of what has been changed since patch004. Hopefully, this can be the last one and I can finally start testing on a cluster. Will also make sure of creating proper manual testing documentation + collecting the test evidence. I wouldn't expect any production issues (fingers crossed) as test coverage is quite good and while I have been fixing the tests, I gained a lot of code knowledge, almost being familiar with the ShuffleHandler inside and out. was (Author: snemeth): Hey [~weichiu], Thanks for putting the exceprt here. This could be fixed in parallel, I would be glad if you could point me to the config that needs to be changed. Currently, I'm working on the test issues produced by the build that ran against patch003: hadoop.mapred.TestReduceFetchFromPartialMem hadoop.mapred.TestReduceFetch There are jiras related to these tests but checked the logs and saw very suspicious things and it pointed me to a code defect. I will upload a next patch soon along with explanation of what has been changed since patch004. Hopefully, this can be the last one and I can finally start testing on a cluster. I wouldn't expect any production issues (fingers crossed) as test coverage is quite good and while I have been fixing the tests, I gained a lot of code knowledge, almost being familiar with the ShuffleHandler inside and out. > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367912#comment-17367912 ] Szilard Nemeth commented on HADOOP-15327: - Hey [~weichiu], Thanks for putting the exceprt here. This could be fixed in parallel, I would be glad if you could point me to the config that needs to be changed. Currently, I'm working on the test issues produced by the build that ran against patch003: hadoop.mapred.TestReduceFetchFromPartialMem hadoop.mapred.TestReduceFetch There are jiras related to these tests but checked the logs and saw very suspicious things and it pointed me to a code defect. I will upload a next patch soon along with explanation of what has been changed since patch004. Hopefully, this can be the last one and I can finally start testing on a cluster. I wouldn't expect any production issues (fingers crossed) as test coverage is quite good and while I have been fixing the tests, I gained a lot of code knowledge, almost being familiar with the ShuffleHandler inside and out. > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: HADOOP-15327.004.patch > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, HADOOP-15327.004.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362368#comment-17362368 ] Szilard Nemeth commented on HADOOP-15327: - *Remaining TODO items that I can make progress with:* - Fix failing unit tests - Testing on cluster > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: HADOOP-15327.003.patch > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > HADOOP-15327.003.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362367#comment-17362367 ] Szilard Nemeth commented on HADOOP-15327: - The latest patch contains commits from this branch: [https://github.com/szilard-nemeth/hadoop/commits/HADOOP-15327-snemeth] There are a couple of commits so I would approach this by explaning the reasons behind each change in the commits. Not all commits are listed, I left out a few trivial ones. Unfortunately, this task was a bit tricky as everytime I touched something in the test, I just found another bug or weird behaviour so it took a great deal of time to solve and discover everything. *1. ShuffleHandler: ch.isOpen() --> ch.isActive(): [https://github.com/szilard-nemeth/hadoop/commit/e703adb57f66da8579baa26257ca9aaed2bf1db5]* This was already mentioned with my previous lenghtier comment. *2. TestShuffleHandler: Fix mocking in testSendMapCount + replace ch.write() with ch.writeAndFlush(): [https://github.com/szilard-nemeth/hadoop/commit/07fbfee5cae85e8e374b53c303e794c19c620efc]* This is about 2 things: - Replacing channel.write calls with channel.writeAndFlush - Fixing bad mocking in org.apache.hadoop.mapred.TestShuffleHandler#testSendMapCount *3. TestShuffleHandler.testMaxConnections: Rewrite test + production code: accepted connection handling: [https://github.com/szilard-nemeth/hadoop/commit/def0059982ef8f0e2f19d385b1a1fcdca8639f9d]* *Changes in production code:* - ShuffleHandler#channelActive added the channel to the channel group (field called 'accepted') before the if statement that enforces the maximum number of open connections. This was the old, wrong piece of code: {code:java} super.channelActive(ctx); LOG.debug("accepted connections={}", accepted.size()); if ((maxShuffleConnections > 0) && (accepted.size() >= maxShuffleConnections)) { {code} - Also, counting the number of open channels with the channel group was unreliable so I introduced a new AtomicInteger field called 'acceptedConnections' to track the open channels / connections. - There was another issue: When the channels were accepted, the counter of open channels was increased but when channels were inactivated I could not see any code that would have maintained (decremented) the value. This was mitigated by adding org.apache.hadoop.mapred.ShuffleHandler.Shuffle#channelInactive that logs the channel inactivated event and decreases the open connections counter: {code:java} @Override public void channelInactive(ChannelHandlerContext ctx) throws Exception { super.channelInactive(ctx); acceptedConnections.decrementAndGet(); LOG.debug("New value of Accepted number of connections={}", acceptedConnections.get()); } {code} *Changes in test code:* - org.apache.hadoop.mapred.TestShuffleHandler#testMaxConnections: Fixed the testcase, the issue was pointed out correctly by [~weichiu] : The connections are accepted in parallel so we should not rely on their order in the test. The way I rewritten this is that I introduced a map to group HttpURLConnection objects by their HTTP response code. Then I check if we only have 200 OK and 429 TOO MANY REQUESTS, and check if the number of 200 OK connections is 2 and there's only one unaccepted connection. *4. increase netty version to 4.1.65.Final: [https://github.com/szilard-nemeth/hadoop/commit/4f4589063b579a93389b1e188c29bd895ae507fc]* This is a simple commit to increase the Netty version to the latest stable 4.x version. See this page: [https://netty.io/downloads.html] It states: "netty-4.1.65.Final.tar.gz ‐ 19-May-2021 (Stable, Recommended)" *5. ShuffleHandler: Fix keepalive test + writing HTTP response properly to channel: [https://github.com/szilard-nemeth/hadoop/commit/1aad4eaace28cfff4a9a9152f7535d70cc6e3734]* This is where things get more interesting. There was a testcase called org.apache.hadoop.mapred.TestShuffleHandler#testKeepAlive that caught an issue that came up because Netty 4.x handles HTTP responses written to the same channel differently than Netty 3.x. See details below. Production code changes: - Added some logs to be able to track what happened when utilizing HTTP Connection Keep-alive. - Added a ChannelOutboundHandlerAdapter that handles exceptions that happens during outbound message construction. This is by default not logged by Netty and I only found this trick to catch these events: {code:java} pipeline.addLast("outboundExcHandler", new ChannelOutboundHandlerAdapter() { @Override public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) throws Exception { promise.addListener(ChannelFutureListener.FIRE_EXCEPTION_ON_FAILURE); super.write(ctx, msg, promise); } }); {code} This solution is described here:
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, > getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: HADOOP-15327.002.patch > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: HADOOP-15327-snemeth.002.patch > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: (was: HADOOP-15327-snemeth.002.patch) > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362029#comment-17362029 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/11/21, 8:35 PM: --- Thanks [~weichiu] for your help. Added a preliminary patch to kick-off Jenkins. Haven't touch Maven shading config so I'm expecting a Maven error from Jenkins as I had it locally. Referring back to [your comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17356433=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17356433]: I'm quite a beginner with shading, A.K.A. I have no idea what to touch to fix the current shading issues. Can you or anyone else help me out with this? *Remaining TODO items that I can make progress with:* - Testing on cluster - Adding explanation comment for the new code changes: A more lengthy comment will follow :) was (Author: snemeth): Thanks [~weichiu] for your help. Added a preliminary patch to kick-off Jenkins. Haven't touch Maven shading config so I'm expecting a Maven error from Jenkins as I had it locally. Referring back to [your comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17356433=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17356433]: I'm quite a beginner with shading. Can you or anyone else help me out with this? *Remaining TODO items:* - Testing on cluster - Adding explanation comment for the new code changes. So a more lengthy comment will follow this :) > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362029#comment-17362029 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/11/21, 8:33 PM: --- Thanks [~weichiu] for your help. Added a preliminary patch to kick-off Jenkins. Haven't touch Maven shading config so I'm expecting a Maven error from Jenkins as I had it locally. Referring back to your comment here: I'm quite a beginner with shading. Can you or anyone else help me out with this? *Remaining TODO items:* - Testing on cluster - Adding explanation comment for the new code changes. So a more lengthy comment will follow this :) was (Author: snemeth): Thanks [~weichiu] for your help. Added a preliminary patch to kick-off Jenkins. Haven't touch shaded config so I'm expecting a Maven error from Jenkins. Referring back to your comment here: I'm quite a beginner with shading. Can you or anyone else help me out with this? *Remaining TODO items:* - Testing on cluster - Adding explanation comment for the new code changes. So a more lengthy comment will follow this :) > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362029#comment-17362029 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/11/21, 8:33 PM: --- Thanks [~weichiu] for your help. Added a preliminary patch to kick-off Jenkins. Haven't touch Maven shading config so I'm expecting a Maven error from Jenkins as I had it locally. Referring back to [your comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17356433=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17356433]: I'm quite a beginner with shading. Can you or anyone else help me out with this? *Remaining TODO items:* - Testing on cluster - Adding explanation comment for the new code changes. So a more lengthy comment will follow this :) was (Author: snemeth): Thanks [~weichiu] for your help. Added a preliminary patch to kick-off Jenkins. Haven't touch Maven shading config so I'm expecting a Maven error from Jenkins as I had it locally. Referring back to your comment here: I'm quite a beginner with shading. Can you or anyone else help me out with this? *Remaining TODO items:* - Testing on cluster - Adding explanation comment for the new code changes. So a more lengthy comment will follow this :) > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362029#comment-17362029 ] Szilard Nemeth commented on HADOOP-15327: - Thanks [~weichiu] for your help. Added a preliminary patch to kick-off Jenkins. Haven't touch shaded config so I'm expecting a Maven error from Jenkins. Referring back to your comment here: I'm quite a beginner with shading. Can you or anyone else help me out with this? *Remaining TODO items:* - Testing on cluster - Adding explanation comment for the new code changes. So a more lengthy comment will follow this :) > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Status: Patch Available (was: In Progress) > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15327: Attachment: HADOOP-15327.001.patch > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15327.001.patch > > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358861#comment-17358861 ] Szilard Nemeth commented on HADOOP-15327: - Let me list the differences introduced because of the migration from Netty 3.x to 4.x. There is a migration guide that mentions most (but not all) of the changes: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html] Please note that the below code changes are based on Wei-Chiu's branch: [https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4] h2. CHANGES IN ShuffleHandler h3. *I will list the changes mostly from ShuffleHandler as it covers almost all type of changes in other classes as well.* *In TestShuffleHandler, the test code was changed by any of the justifications listed down below.* h3. Change category #1: General API changes / non-configuration getters: Details: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes] {quote}Non-configuration getters have no get- prefix anymore. (e.g. Channel.getRemoteAddress() → Channel.remoteAddress()) Boolean properties are still prefixed with is- to avoid confusion (e.g. 'empty' is both an adjective and a verb, so empty() can have two meanings.) {quote} I'm just listing all the changes without additional context (in which method they were changed) separated by three dots, as they are simply method renamings: {code:java} -future.getChannel().close(); +future.channel().closeFuture().awaitUninterruptibly(); ... ... - ChannelPipeline pipeline = future.getChannel().getPipeline(); + ChannelPipeline pipeline = future.channel().pipeline(); ... ... -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); ... ... - if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) { -e.getChannel().close(); + if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) { +ctx.channel().close(); ... ... - accepted.add(evt.getChannel()); + accepted.add(ctx.channel()); ... ... -new QueryStringDecoder(request.getUri()).getParameters(); +new QueryStringDecoder(request.getUri()).parameters(); //getUri was not changed, see this later ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - reduceContext.getCtx().getChannel(), + reduceContext.getCtx().channel(), ... ... - if (ch.getPipeline().get(SslHandler.class) == null) { + if (ch.pipeline().get(SslHandler.class) == null) { ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE); + ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE); ... ... - Channel ch = e.getChannel(); - Throwable cause = e.getCause(); + Channel ch = ctx.channel(); {code} h3. Change category #2: General API changes / Method signature changes. *2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.* [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound] {quote}The terms 'upstream' and 'downstream' were pretty confusing to beginners. 4.0 uses 'inbound' and 'outbound' wherever possible. {quote} {code:java} - class Shuffle extends SimpleChannelUpstreamHandler { + @ChannelHandler.Sharable + class Shuffle extends ChannelInboundHandlerAdapter { {code} *2.2: Simplifed channel state model: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]* {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} *2.2.1 Changes in class: Shuffle* {code:java} @Override -public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) +public void channelActive(ChannelHandlerContext ctx) throws Exception { - super.channelOpen(ctx, evt); + super.channelActive(ctx); {code} *2.2.2 Changes in org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* Quoting the change again: {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} {code:java} LOG.error("Shuffle error: ", cause); - if (ch.isConnected()) { -LOG.error("Shuffle error " + e); +
[jira] [Issue Comment Deleted] (HADOOP-11219) [Umbrella] Upgrade to netty 4
[ https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-11219: Comment: was deleted (was: Let me list the differences introduced because of the migration from Netty 3.x to 4.x. There is a migration guide that mentions most (but not all) of the changes: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html] Please note that the below code changes are based on Wei-Chiu's branch: [https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4] h2. CHANGES IN ShuffleHandler h3. *I will list the changes mostly from ShuffleHandler as it covers almost all type of changes in other classes as well.* *In TestShuffleHandler, the test code was changed by any of the justifications listed down below.* h3. Change category #1: General API changes / non-configuration getters: Details: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes] {quote}Non-configuration getters have no get- prefix anymore. (e.g. Channel.getRemoteAddress() → Channel.remoteAddress()) Boolean properties are still prefixed with is- to avoid confusion (e.g. 'empty' is both an adjective and a verb, so empty() can have two meanings.) {quote} I'm just listing all the changes without additional context (in which method they were changed) separated by three dots, as they are simply method renamings: {code:java} -future.getChannel().close(); +future.channel().closeFuture().awaitUninterruptibly(); ... ... - ChannelPipeline pipeline = future.getChannel().getPipeline(); + ChannelPipeline pipeline = future.channel().pipeline(); ... ... -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); ... ... - if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) { -e.getChannel().close(); + if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) { +ctx.channel().close(); ... ... - accepted.add(evt.getChannel()); + accepted.add(ctx.channel()); ... ... -new QueryStringDecoder(request.getUri()).getParameters(); +new QueryStringDecoder(request.getUri()).parameters(); //getUri was not changed, see this later ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - reduceContext.getCtx().getChannel(), + reduceContext.getCtx().channel(), ... ... - if (ch.getPipeline().get(SslHandler.class) == null) { + if (ch.pipeline().get(SslHandler.class) == null) { ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE); + ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE); ... ... - Channel ch = e.getChannel(); - Throwable cause = e.getCause(); + Channel ch = ctx.channel(); {code} h3. Change category #2: General API changes / Method signature changes. *2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.* [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound] {quote}The terms 'upstream' and 'downstream' were pretty confusing to beginners. 4.0 uses 'inbound' and 'outbound' wherever possible. {quote} {code:java} - class Shuffle extends SimpleChannelUpstreamHandler { + @ChannelHandler.Sharable + class Shuffle extends ChannelInboundHandlerAdapter { {code} *2.2: Simplifed channel state model: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]* {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} *2.2.1 Changes in class: Shuffle* {code:java} @Override -public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) +public void channelActive(ChannelHandlerContext ctx) throws Exception { - super.channelOpen(ctx, evt); + super.channelActive(ctx); {code} *2.2.2 Changes in org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* Quoting the change again: {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} {code:java} LOG.error("Shuffle error: ", cause); - if (ch.isConnected()) { -LOG.error("Shuffle error " + e); + if
[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4
[ https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850 ] Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:40 PM: -- Let me list the differences introduced because of the migration from Netty 3.x to 4.x. There is a migration guide that mentions most (but not all) of the changes: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html] Please note that the below code changes are based on Wei-Chiu's branch: [https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4] h2. CHANGES IN ShuffleHandler h3. *I will list the changes mostly from ShuffleHandler as it covers almost all type of changes in other classes as well.* *In TestShuffleHandler, the test code was changed by any of the justifications listed down below.* h3. Change category #1: General API changes / non-configuration getters: Details: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes] {quote}Non-configuration getters have no get- prefix anymore. (e.g. Channel.getRemoteAddress() → Channel.remoteAddress()) Boolean properties are still prefixed with is- to avoid confusion (e.g. 'empty' is both an adjective and a verb, so empty() can have two meanings.) {quote} I'm just listing all the changes without additional context (in which method they were changed) separated by three dots, as they are simply method renamings: {code:java} -future.getChannel().close(); +future.channel().closeFuture().awaitUninterruptibly(); ... ... - ChannelPipeline pipeline = future.getChannel().getPipeline(); + ChannelPipeline pipeline = future.channel().pipeline(); ... ... -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); ... ... - if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) { -e.getChannel().close(); + if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) { +ctx.channel().close(); ... ... - accepted.add(evt.getChannel()); + accepted.add(ctx.channel()); ... ... -new QueryStringDecoder(request.getUri()).getParameters(); +new QueryStringDecoder(request.getUri()).parameters(); //getUri was not changed, see this later ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - reduceContext.getCtx().getChannel(), + reduceContext.getCtx().channel(), ... ... - if (ch.getPipeline().get(SslHandler.class) == null) { + if (ch.pipeline().get(SslHandler.class) == null) { ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE); + ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE); ... ... - Channel ch = e.getChannel(); - Throwable cause = e.getCause(); + Channel ch = ctx.channel(); {code} h3. Change category #2: General API changes / Method signature changes. *2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.* [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound] {quote}The terms 'upstream' and 'downstream' were pretty confusing to beginners. 4.0 uses 'inbound' and 'outbound' wherever possible. {quote} {code:java} - class Shuffle extends SimpleChannelUpstreamHandler { + @ChannelHandler.Sharable + class Shuffle extends ChannelInboundHandlerAdapter { {code} *2.2: Simplifed channel state model: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]* {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} *2.2.1 Changes in class: Shuffle* {code:java} @Override -public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) +public void channelActive(ChannelHandlerContext ctx) throws Exception { - super.channelOpen(ctx, evt); + super.channelActive(ctx); {code} *2.2.2 Changes in org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* Quoting the change again: {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} {code:java} LOG.error("Shuffle error: ", cause); - if
[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4
[ https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850 ] Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:38 PM: -- Let me list the differences introduced because of the migration from Netty 3.x to 4.x. There is a migration guide that mentions most (but not all) of the changes: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html] Please note that the below code changes are based on Wei-Chiu's branch: [https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4] h2. CHANGES IN ShuffleHandler h3. *I will list the changes mostly from ShuffleHandler as it covers almost all type of changes in other classes as well.* *In TestShuffleHandler, the test code was changed by any of the justifications listed down below.* h3. Change category #1: General API changes / non-configuration getters: Details: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes] {quote}Non-configuration getters have no get- prefix anymore. (e.g. Channel.getRemoteAddress() → Channel.remoteAddress()) Boolean properties are still prefixed with is- to avoid confusion (e.g. 'empty' is both an adjective and a verb, so empty() can have two meanings.) {quote} I'm just listing all the changes without additional context (in which method they were changed) separated by three dots, as they are simply method renamings: {code:java} -future.getChannel().close(); +future.channel().closeFuture().awaitUninterruptibly(); ... ... - ChannelPipeline pipeline = future.getChannel().getPipeline(); + ChannelPipeline pipeline = future.channel().pipeline(); ... ... -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); ... ... - if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) { -e.getChannel().close(); + if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) { +ctx.channel().close(); ... ... - accepted.add(evt.getChannel()); + accepted.add(ctx.channel()); ... ... -new QueryStringDecoder(request.getUri()).getParameters(); +new QueryStringDecoder(request.getUri()).parameters(); //getUri was not changed, see this later ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - reduceContext.getCtx().getChannel(), + reduceContext.getCtx().channel(), ... ... - if (ch.getPipeline().get(SslHandler.class) == null) { + if (ch.pipeline().get(SslHandler.class) == null) { ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE); + ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE); ... ... - Channel ch = e.getChannel(); - Throwable cause = e.getCause(); + Channel ch = ctx.channel(); {code} h3. Change category #2: General API changes / Method signature changes. *2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.* [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound] {quote}The terms 'upstream' and 'downstream' were pretty confusing to beginners. 4.0 uses 'inbound' and 'outbound' wherever possible. {quote} {code:java} - class Shuffle extends SimpleChannelUpstreamHandler { + @ChannelHandler.Sharable + class Shuffle extends ChannelInboundHandlerAdapter { {code} *2.2: Simplifed channel state model: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]* {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} *2.2.1 Changes in class: Shuffle* {code:java} @Override -public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) +public void channelActive(ChannelHandlerContext ctx) throws Exception { - super.channelOpen(ctx, evt); + super.channelActive(ctx); {code} *2.2.2 Changes in org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* Quoting the change again: {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} {code:java} LOG.error("Shuffle error: ", cause); - if
[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4
[ https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850 ] Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:36 PM: -- Let me list the differences introduced because of the migration from Netty 3.x to 4.x. There is a migration guide that mentions most (but not all) of the changes: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html] Please note that the below code changes are based on Wei-Chiu's branch: [https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4] h2. CHANGES IN ShuffleHandler h3. *I will list the changes mostly from ShuffleHandler as it covers almost all type of changes in other classes as well.* *In TestShuffleHandler, the test code was changed by any of the justifications listed down below.* h3. Change category #1: General API changes / non-configuration getters: Details: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes] {quote}Non-configuration getters have no get- prefix anymore. (e.g. Channel.getRemoteAddress() → Channel.remoteAddress()) Boolean properties are still prefixed with is- to avoid confusion (e.g. 'empty' is both an adjective and a verb, so empty() can have two meanings.) {quote} I'm just listing all the changes without additional context (in which method they were changed) separated by three dots, as they are simply method renamings: {code:java} -future.getChannel().close(); +future.channel().closeFuture().awaitUninterruptibly(); ... ... - ChannelPipeline pipeline = future.getChannel().getPipeline(); + ChannelPipeline pipeline = future.channel().pipeline(); ... ... -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); ... ... - if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) { -e.getChannel().close(); + if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) { +ctx.channel().close(); ... ... - accepted.add(evt.getChannel()); + accepted.add(ctx.channel()); ... ... -new QueryStringDecoder(request.getUri()).getParameters(); +new QueryStringDecoder(request.getUri()).parameters(); //getUri was not changed, see this later ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - reduceContext.getCtx().getChannel(), + reduceContext.getCtx().channel(), ... ... - if (ch.getPipeline().get(SslHandler.class) == null) { + if (ch.pipeline().get(SslHandler.class) == null) { ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE); + ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE); ... ... - Channel ch = e.getChannel(); - Throwable cause = e.getCause(); + Channel ch = ctx.channel(); {code} h3. Change category #2: General API changes / Method signature changes. *2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.* [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound] {quote}The terms 'upstream' and 'downstream' were pretty confusing to beginners. 4.0 uses 'inbound' and 'outbound' wherever possible. {quote} {code:java} - class Shuffle extends SimpleChannelUpstreamHandler { + @ChannelHandler.Sharable + class Shuffle extends ChannelInboundHandlerAdapter { {code} *2.2: Simplifed channel state model: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]* {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} *2.2.1 Changes in class: Shuffle* {code:java} @Override -public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) +public void channelActive(ChannelHandlerContext ctx) throws Exception { - super.channelOpen(ctx, evt); + super.channelActive(ctx); {code} *2.2.2 Changes in org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* Quoting the change again: {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} {code:java} LOG.error("Shuffle error: ", cause); - if
[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4
[ https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850 ] Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:29 PM: -- Let me list the differences introduced because of the migration from Netty 3.x to 4.x. There is a migration guide that mentions most (but not all) of the changes: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html] Please note that the below code changes are based on Wei-Chiu's branch: [https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4] h2. CHANGES IN ShuffleHandler h3. *I will list the changes mostly from ShuffleHandler as it covers almost all type of changes in other classes as well.* *In TestShuffleHandler, the test code was changed by any of the justifications listed down below.* h3. Change category #1: General API changes / non-configuration getters: Details: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes] {quote}Non-configuration getters have no get- prefix anymore. (e.g. Channel.getRemoteAddress() → Channel.remoteAddress()) Boolean properties are still prefixed with is- to avoid confusion (e.g. 'empty' is both an adjective and a verb, so empty() can have two meanings.) {quote} I'm just listing all the changes without additional context (in which method they were changed) separated by three dots, as they are simply method renamings: {code:java} -future.getChannel().close(); +future.channel().closeFuture().awaitUninterruptibly(); ... ... - ChannelPipeline pipeline = future.getChannel().getPipeline(); + ChannelPipeline pipeline = future.channel().pipeline(); ... ... -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); ... ... - if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) { -e.getChannel().close(); + if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) { +ctx.channel().close(); ... ... - accepted.add(evt.getChannel()); + accepted.add(ctx.channel()); ... ... -new QueryStringDecoder(request.getUri()).getParameters(); +new QueryStringDecoder(request.getUri()).parameters(); //getUri was not changed, see this later ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - reduceContext.getCtx().getChannel(), + reduceContext.getCtx().channel(), ... ... - if (ch.getPipeline().get(SslHandler.class) == null) { + if (ch.pipeline().get(SslHandler.class) == null) { ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE); + ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE); ... ... - Channel ch = e.getChannel(); - Throwable cause = e.getCause(); + Channel ch = ctx.channel(); {code} h3. Change category #2: General API changes / Method signature changes. *2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.* [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound] {quote}The terms 'upstream' and 'downstream' were pretty confusing to beginners. 4.0 uses 'inbound' and 'outbound' wherever possible. {quote} {code:java} - class Shuffle extends SimpleChannelUpstreamHandler { + @ChannelHandler.Sharable + class Shuffle extends ChannelInboundHandlerAdapter { {code} *2.2: Simplifed channel state model: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]* {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} *2.2.1 Changes in class: Shuffle* {code:java} @Override -public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) +public void channelActive(ChannelHandlerContext ctx) throws Exception { - super.channelOpen(ctx, evt); + super.channelActive(ctx); {code} *2.2.2 Changes in org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* Quoting the change again: {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} {code:java} LOG.error("Shuffle error: ", cause); - if
[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4
[ https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850 ] Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:28 PM: -- Let me list the differences introduced because of the migration from Netty 3.x to 4.x. There is a migration guide that mentions most (but not all) of the changes: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html] Please note that the below code changes are based on Wei-Chiu's branch: [https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4] h2. CHANGES IN ShuffleHandler h3. *I will list the changes mostly from ShuffleHandler as it covers almost all type of changes in other classes as well.* *In TestShuffleHandler, the test code was changed by any of the justifications listed down below.* h3. Change category #1: General API changes / non-configuration getters: Details: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes] {quote}Non-configuration getters have no get- prefix anymore. (e.g. Channel.getRemoteAddress() → Channel.remoteAddress()) Boolean properties are still prefixed with is- to avoid confusion (e.g. 'empty' is both an adjective and a verb, so empty() can have two meanings.) {quote} I'm just listing all the changes without additional context (in which method they were changed) separated by three dots, as they are simply method renamings: {code:java} -future.getChannel().close(); +future.channel().closeFuture().awaitUninterruptibly(); ... ... - ChannelPipeline pipeline = future.getChannel().getPipeline(); + ChannelPipeline pipeline = future.channel().pipeline(); ... ... -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); ... ... - if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) { -e.getChannel().close(); + if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) { +ctx.channel().close(); ... ... - accepted.add(evt.getChannel()); + accepted.add(ctx.channel()); ... ... -new QueryStringDecoder(request.getUri()).getParameters(); +new QueryStringDecoder(request.getUri()).parameters(); //getUri was not changed, see this later ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - reduceContext.getCtx().getChannel(), + reduceContext.getCtx().channel(), ... ... - if (ch.getPipeline().get(SslHandler.class) == null) { + if (ch.pipeline().get(SslHandler.class) == null) { ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE); + ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE); ... ... - Channel ch = e.getChannel(); - Throwable cause = e.getCause(); + Channel ch = ctx.channel(); {code} h3. Change category #2: General API changes / Method signature changes. *2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.* [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound] {quote}The terms 'upstream' and 'downstream' were pretty confusing to beginners. 4.0 uses 'inbound' and 'outbound' wherever possible. {quote} {code:java} - class Shuffle extends SimpleChannelUpstreamHandler { + @ChannelHandler.Sharable + class Shuffle extends ChannelInboundHandlerAdapter { {code} *2.2: Simplifed channel state model: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]* {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} *2.2.1 Changes in class: Shuffle* {code:java} @Override -public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) +public void channelActive(ChannelHandlerContext ctx) throws Exception { - super.channelOpen(ctx, evt); + super.channelActive(ctx); {code} *2.2.2 Changes in org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* Quoting the change again: {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} {code:java} LOG.error("Shuffle error: ", cause); - if
[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4
[ https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850 ] Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:27 PM: -- Let me list the differences introduced because of the migration from Netty 3.x to 4.x. There is a migration guide that mentions most (but not all) of the changes: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html] Please note that the below code changes are based on Wei-Chiu's branch: [https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4] h2. CHANGES IN ShuffleHandler h3. *I will list the changes mostly from ShuffleHandler as it covers almost all type of changes in other classes as well.* *In TestShuffleHandler, the test code was changed by any of the justifications listed down below.* Change category #1: General API changes / non-configuration getters: Details: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes] {quote}Non-configuration getters have no get- prefix anymore. (e.g. Channel.getRemoteAddress() → Channel.remoteAddress()) Boolean properties are still prefixed with is- to avoid confusion (e.g. 'empty' is both an adjective and a verb, so empty() can have two meanings.) {quote} I'm just listing all the changes without additional context (in which method they were changed) separated by three dots, as they are simply method renamings: {code:java} -future.getChannel().close(); +future.channel().closeFuture().awaitUninterruptibly(); ... ... - ChannelPipeline pipeline = future.getChannel().getPipeline(); + ChannelPipeline pipeline = future.channel().pipeline(); ... ... -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); ... ... - if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) { -e.getChannel().close(); + if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) { +ctx.channel().close(); ... ... - accepted.add(evt.getChannel()); + accepted.add(ctx.channel()); ... ... -new QueryStringDecoder(request.getUri()).getParameters(); +new QueryStringDecoder(request.getUri()).parameters(); //getUri was not changed, see this later ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - reduceContext.getCtx().getChannel(), + reduceContext.getCtx().channel(), ... ... - if (ch.getPipeline().get(SslHandler.class) == null) { + if (ch.pipeline().get(SslHandler.class) == null) { ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE); + ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE); ... ... - Channel ch = e.getChannel(); - Throwable cause = e.getCause(); + Channel ch = ctx.channel(); {code} h3. Change category #2: General API changes / Method signature changes. *2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.* [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound] {quote}The terms 'upstream' and 'downstream' were pretty confusing to beginners. 4.0 uses 'inbound' and 'outbound' wherever possible. {quote} {code:java} - class Shuffle extends SimpleChannelUpstreamHandler { + @ChannelHandler.Sharable + class Shuffle extends ChannelInboundHandlerAdapter { {code} *2.2: Simplifed channel state model: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]* {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} *2.2.1 Changes in class: Shuffle* {code:java} @Override -public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) +public void channelActive(ChannelHandlerContext ctx) throws Exception { - super.channelOpen(ctx, evt); + super.channelActive(ctx); {code} *2.2.2 Changes in org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* Quoting the change again: {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} {code:java} LOG.error("Shuffle error: ", cause); - if (ch.isConnected()) {
[jira] [Commented] (HADOOP-11219) [Umbrella] Upgrade to netty 4
[ https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850 ] Szilard Nemeth commented on HADOOP-11219: - Let me list the differences introduced because of the migration from Netty 3.x to 4.x. There is a migration guide that mentions most (but not all) of the changes: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html] Please note that the below code changes are based on Wei-Chiu's branch: [https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4] h2. CHANGES IN ShuffleHandler h3. *I will list the changes mostly from ShuffleHandler as it covers almost all type of changes in other classes as well.* *In TestShuffleHandler, the test code was changed by any of the justifications listed down below.* Change category #1: General API changes / non-configuration getters: Details: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes] {quote}Non-configuration getters have no get- prefix anymore. (e.g. Channel.getRemoteAddress() → Channel.remoteAddress()) Boolean properties are still prefixed with is- to avoid confusion (e.g. 'empty' is both an adjective and a verb, so empty() can have two meanings.) {quote} I'm just listing all the changes without additional context (in which method they were changed) separated by three dots, as they are simply method renamings: {code:java} -future.getChannel().close(); +future.channel().closeFuture().awaitUninterruptibly(); ... ... - ChannelPipeline pipeline = future.getChannel().getPipeline(); + ChannelPipeline pipeline = future.channel().pipeline(); ... ... -port = ((InetSocketAddress)ch.getLocalAddress()).getPort(); +port = ((InetSocketAddress)ch.localAddress()).getPort(); ... ... - if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) { -e.getChannel().close(); + if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) { +ctx.channel().close(); ... ... - accepted.add(evt.getChannel()); + accepted.add(ctx.channel()); ... ... -new QueryStringDecoder(request.getUri()).getParameters(); +new QueryStringDecoder(request.getUri()).parameters(); //getUri was not changed, see this later ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - reduceContext.getCtx().getChannel(), + reduceContext.getCtx().channel(), ... ... - if (ch.getPipeline().get(SslHandler.class) == null) { + if (ch.pipeline().get(SslHandler.class) == null) { ... ... - Channel ch = evt.getChannel(); - ChannelPipeline pipeline = ch.getPipeline(); + Channel ch = ctx.channel(); + ChannelPipeline pipeline = ch.pipeline(); ... ... - ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE); + ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE); ... ... - Channel ch = e.getChannel(); - Throwable cause = e.getCause(); + Channel ch = ctx.channel(); {code} h3. Change category #2: General API changes / Method signature changes. *2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.* [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound] {quote}The terms 'upstream' and 'downstream' were pretty confusing to beginners. 4.0 uses 'inbound' and 'outbound' wherever possible. {quote} {code:java} - class Shuffle extends SimpleChannelUpstreamHandler { + @ChannelHandler.Sharable + class Shuffle extends ChannelInboundHandlerAdapter { {code} *2.2: Simplifed channel state model: [https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]* {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} *2.2.1 Changes in class: Shuffle* {code:java} @Override -public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) +public void channelActive(ChannelHandlerContext ctx) throws Exception { - super.channelOpen(ctx, evt); + super.channelActive(ctx); {code} *2.2.2 Changes in org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* Quoting the change again: {quote}channelOpen, channelBound, and channelConnected have been merged to channelActive. channelDisconnected, channelUnbound, and channelClosed have been merged to channelInactive. Likewise, Channel.isBound() and isConnected() have been merged to isActive(). {quote} {code:java} LOG.error("Shuffle error: ", cause); - if (ch.isConnected()) { -LOG.error("Shuffle error " + e); +
[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357627#comment-17357627 ] Szilard Nemeth edited comment on HADOOP-15327 at 6/4/21, 9:04 PM: -- Thanks [~weichiu], Will use skipShade temporarily then will check how to resolve the shading issue once all code issues are fixed and in place. Also thanks for your testing recommendations. was (Author: snemeth): Thanks [~weichiu], Will use skipShade temporarily then will check how to resolve the shading issue once all code issues are fixed and in place. > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357627#comment-17357627 ] Szilard Nemeth commented on HADOOP-15327: - Thanks [~weichiu], Will use skipShade temporarily then will check how to resolve the shading issue once all code issues are fixed and in place. > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work started] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-15327 started by Szilard Nemeth. --- > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356401#comment-17356401 ] Szilard Nemeth commented on HADOOP-15327: - Hmm, can't attach the file, probably it's a permission issue. I don't want to paste 2000+ lines here. Uploaded the file to my personal Google Drive: https://drive.google.com/file/d/1-ovH8snqTS73oLNsxtwgrvaDVynOBIP7/view?usp=sharing > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned HADOOP-15327: --- Assignee: Szilard Nemeth > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Szilard Nemeth >Priority: Major > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4
[ https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356377#comment-17356377 ] Szilard Nemeth commented on HADOOP-15327: - Hi [~weichiu], I took over this Jira as it was unassigned, I hope it's not a problem. The plan is to continue the work based on your code changes until all the UT issues are fixed. Did you have any plan for testing? Is it enough to test basic MR-based jobs with shuffling? One more thing: The latest version on your branch ([https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4)] produced Maven enforcer issues for me. The command I'm using to build Hadoop (from its root): {code} mvn clean install -Pdist -DskipTests -Dmaven.javadoc.skip=true -e | tee /tmp/maven_out {code} Please see the attached output file. Did you encounter similar build issues? I can see the latest commit is form March 2021 so it's not that old so I don't assume the build system + enforcing rules were changed. > Upgrade MR ShuffleHandler to use Netty4 > --- > > Key: HADOOP-15327 > URL: https://issues.apache.org/jira/browse/HADOOP-15327 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Priority: Major > > This way, we can remove the dependencies on the netty3 (jboss.netty) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-11219) [Umbrella] Upgrade to netty 4
[ https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-11219: Summary: [Umbrella] Upgrade to netty 4 (was: Upgrade to netty 4) > [Umbrella] Upgrade to netty 4 > - > > Key: HADOOP-11219 > URL: https://issues.apache.org/jira/browse/HADOOP-11219 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Major > > This is an umbrella jira to track the effort of upgrading to Netty 4. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-16683: Fix Version/s: 3.2.2 3.1.4 Resolution: Fixed Status: Resolved (was: Patch Available) > Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped > AccessControlException > -- > > Key: HADOOP-16683 > URL: https://issues.apache.org/jira/browse/HADOOP-16683 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, > HADOOP-16683.003.patch, HADOOP-16683.branch-3.1.001.patch, > HADOOP-16683.branch-3.2.001.patch, HADOOP-16683.branch-3.2.001.patch > > > Follow up patch on HADOOP-16580. > We successfully disabled the retry in case of an AccessControlException which > has resolved some of the cases, but in other cases AccessControlException is > wrapped inside another IOException and you can only get the original > exception by calling getCause(). > Let's add this extra case as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014992#comment-17014992 ] Szilard Nemeth commented on HADOOP-16683: - Thanks [~adam.antal], Committed your patches to branch-3.2 and branch-3.1 Resolving jira. > Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped > AccessControlException > -- > > Key: HADOOP-16683 > URL: https://issues.apache.org/jira/browse/HADOOP-16683 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, > HADOOP-16683.003.patch, HADOOP-16683.branch-3.1.001.patch, > HADOOP-16683.branch-3.2.001.patch, HADOOP-16683.branch-3.2.001.patch > > > Follow up patch on HADOOP-16580. > We successfully disabled the retry in case of an AccessControlException which > has resolved some of the cases, but in other cases AccessControlException is > wrapped inside another IOException and you can only get the original > exception by calling getCause(). > Let's add this extra case as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-16683: Fix Version/s: 3.3.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped > AccessControlException > -- > > Key: HADOOP-16683 > URL: https://issues.apache.org/jira/browse/HADOOP-16683 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, > HADOOP-16683.003.patch > > > Follow up patch on HADOOP-16580. > We successfully disabled the retry in case of an AccessControlException which > has resolved some of the cases, but in other cases AccessControlException is > wrapped inside another IOException and you can only get the original > exception by calling getCause(). > Let's add this extra case as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970825#comment-16970825 ] Szilard Nemeth commented on HADOOP-16683: - Thanks [~adam.antal] for this patch and [~pbacsko] for the review! Just committed to trunk! Closing this jira as I don't think we need backports to other branches. [~adam.antal]: If you think differently, please reopen this jira and set appropriate target versions. Thanks! > Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped > AccessControlException > -- > > Key: HADOOP-16683 > URL: https://issues.apache.org/jira/browse/HADOOP-16683 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, > HADOOP-16683.003.patch > > > Follow up patch on HADOOP-16580. > We successfully disabled the retry in case of an AccessControlException which > has resolved some of the cases, but in other cases AccessControlException is > wrapped inside another IOException and you can only get the original > exception by calling getCause(). > Let's add this extra case as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970824#comment-16970824 ] Szilard Nemeth commented on HADOOP-16580: - Hi [~adam.antal]! Thanks for the update and for the link to the newly filed jira. > Disable retry of FailoverOnNetworkExceptionRetry in case of > AccessControlException > -- > > Key: HADOOP-16580 > URL: https://issues.apache.org/jira/browse/HADOOP-16580 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch, > HADOOP-16580.003.patch, HADOOP-16580.branch-3.2.001.patch > > > HADOOP-14982 handled the case where a SaslException is thrown. The issue > still persists, since the exception that is thrown is an > *AccessControlException* because user has no kerberos credentials. > My suggestion is that we should add this case as well to > {{FailoverOnNetworkExceptionRetry}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16510) [hadoop-common] Fix order of actual and expected expression in assert statements
[ https://issues.apache.org/jira/browse/HADOOP-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967401#comment-16967401 ] Szilard Nemeth commented on HADOOP-16510: - Sure [~adam.antal]! > [hadoop-common] Fix order of actual and expected expression in assert > statements > > > Key: HADOOP-16510 > URL: https://issues.apache.org/jira/browse/HADOOP-16510 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16510.001.patch, HADOOP-16510.002.patch, > HADOOP-16510.003.patch > > > Fix order of actual and expected expression in assert statements which gives > misleading message when test case fails. Attached file has some of the places > where it is placed wrongly. > {code:java} > [ERROR] > testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) > Time elapsed: 3.385 s <<< FAILURE! > java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but > was:<0> > {code} > For long term, [AssertJ|http://joel-costigliola.github.io/assertj/] can be > used for new test cases which avoids such mistakes. > This is a follow-up jira for the hadoop-common project. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16510) [hadoop-common] Fix order of actual and expected expression in assert statements
[ https://issues.apache.org/jira/browse/HADOOP-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-16510: Fix Version/s: 3.3.0 > [hadoop-common] Fix order of actual and expected expression in assert > statements > > > Key: HADOOP-16510 > URL: https://issues.apache.org/jira/browse/HADOOP-16510 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-16510.001.patch, HADOOP-16510.002.patch, > HADOOP-16510.003.patch > > > Fix order of actual and expected expression in assert statements which gives > misleading message when test case fails. Attached file has some of the places > where it is placed wrongly. > {code:java} > [ERROR] > testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) > Time elapsed: 3.385 s <<< FAILURE! > java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but > was:<0> > {code} > For long term, [AssertJ|http://joel-costigliola.github.io/assertj/] can be > used for new test cases which avoids such mistakes. > This is a follow-up jira for the hadoop-common project. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16510) [hadoop-common] Fix order of actual and expected expression in assert statements
[ https://issues.apache.org/jira/browse/HADOOP-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964016#comment-16964016 ] Szilard Nemeth commented on HADOOP-16510: - Hi [~adam.antal]! Thanks for this patch, very good job! I found only 1 nit: I removed a commented out line in file: TestProtoBufRpc, from the imports: {code:java} // import org.junit.Assert; {code} +1, committing this to trunk! [~adam.antal]: What about backporting this to branch-3.2 / branch-3.1? Thanks! > [hadoop-common] Fix order of actual and expected expression in assert > statements > > > Key: HADOOP-16510 > URL: https://issues.apache.org/jira/browse/HADOOP-16510 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HADOOP-16510.001.patch, HADOOP-16510.002.patch, > HADOOP-16510.003.patch > > > Fix order of actual and expected expression in assert statements which gives > misleading message when test case fails. Attached file has some of the places > where it is placed wrongly. > {code:java} > [ERROR] > testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService) > Time elapsed: 3.385 s <<< FAILURE! > java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but > was:<0> > {code} > For long term, [AssertJ|http://joel-costigliola.github.io/assertj/] can be > used for new test cases which avoids such mistakes. > This is a follow-up jira for the hadoop-common project. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952779#comment-16952779 ] Szilard Nemeth commented on HADOOP-16580: - Hi [~adam.antal]! Latest patch looks good, +1. Committed to trunk, branch-3.2 / branch-3.1. Thanks [~pbacsko] and [~shuzirra] for the reviews! > Disable retry of FailoverOnNetworkExceptionRetry in case of > AccessControlException > -- > > Key: HADOOP-16580 > URL: https://issues.apache.org/jira/browse/HADOOP-16580 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch, > HADOOP-16580.003.patch, HADOOP-16580.branch-3.2.001.patch > > > HADOOP-14982 handled the case where a SaslException is thrown. The issue > still persists, since the exception that is thrown is an > *AccessControlException* because user has no kerberos credentials. > My suggestion is that we should add this case as well to > {{FailoverOnNetworkExceptionRetry}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-16580: Hadoop Flags: Reviewed > Disable retry of FailoverOnNetworkExceptionRetry in case of > AccessControlException > -- > > Key: HADOOP-16580 > URL: https://issues.apache.org/jira/browse/HADOOP-16580 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch, > HADOOP-16580.003.patch, HADOOP-16580.branch-3.2.001.patch > > > HADOOP-14982 handled the case where a SaslException is thrown. The issue > still persists, since the exception that is thrown is an > *AccessControlException* because user has no kerberos credentials. > My suggestion is that we should add this case as well to > {{FailoverOnNetworkExceptionRetry}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-16580: Fix Version/s: 3.2.2 3.1.4 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Disable retry of FailoverOnNetworkExceptionRetry in case of > AccessControlException > -- > > Key: HADOOP-16580 > URL: https://issues.apache.org/jira/browse/HADOOP-16580 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch, > HADOOP-16580.003.patch, HADOOP-16580.branch-3.2.001.patch > > > HADOOP-14982 handled the case where a SaslException is thrown. The issue > still persists, since the exception that is thrown is an > *AccessControlException* because user has no kerberos credentials. > My suggestion is that we should add this case as well to > {{FailoverOnNetworkExceptionRetry}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-16580: Attachment: HADOOP-16580.branch-3.2.001.patch > Disable retry of FailoverOnNetworkExceptionRetry in case of > AccessControlException > -- > > Key: HADOOP-16580 > URL: https://issues.apache.org/jira/browse/HADOOP-16580 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch, > HADOOP-16580.003.patch, HADOOP-16580.branch-3.2.001.patch > > > HADOOP-14982 handled the case where a SaslException is thrown. The issue > still persists, since the exception that is thrown is an > *AccessControlException* because user has no kerberos credentials. > My suggestion is that we should add this case as well to > {{FailoverOnNetworkExceptionRetry}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException
[ https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949408#comment-16949408 ] Szilard Nemeth commented on HADOOP-16580: - Hi [~adam.antal]! Thanks for the patch! Actually, I'm with [~shuzirra] on this one: Without your excellent explanation, I wouldn't understand why the method is called failsWithAccessControlExceptionEightTimes. As you mentioned: Could you please incorporate your explanation into javadoc, as much as possible? I don't only mean for the above method, but any other part of code you feel needs some explanation. Apart from this, I could give a +1 for this, when you have the javadocs in place. Thanks! > Disable retry of FailoverOnNetworkExceptionRetry in case of > AccessControlException > -- > > Key: HADOOP-16580 > URL: https://issues.apache.org/jira/browse/HADOOP-16580 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch > > > HADOOP-14982 handled the case where a SaslException is thrown. The issue > still persists, since the exception that is thrown is an > *AccessControlException* because user has no kerberos credentials. > My suggestion is that we should add this case as well to > {{FailoverOnNetworkExceptionRetry}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException
[ https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647135#comment-16647135 ] Szilard Nemeth commented on HADOOP-15717: - Thanks [~rkanter]! > TGT renewal thread does not log IOException > --- > > Key: HADOOP-15717 > URL: https://issues.apache.org/jira/browse/HADOOP-15717 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Fix For: 3.3.0 > > Attachments: HADOOP-15717.001.patch, HADOOP-15717.002.patch > > > I came across a case where tgt.getEndTime() was returned null and it resulted > in an NPE, this observation was popped out of a test suite execution on a > cluster. The reason for logging the {{IOException}} is that it helps to > troubleshoot what caused the exception, as it can come from two different > calls from the try-catch. > I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from > logging the fact that the ticket's {{endDate}} was null, we have not logged > the exception at all. > With the current code, the exception is swallowed and the thread terminates > in case the ticket's {{endDate}} is null. > As this can happen with OpenJDK for example, it is required to print the > exception (stack trace, message) to the log. > The code should be updated here: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15676) Cleanup TestSSLHttpServer
[ https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647102#comment-16647102 ] Szilard Nemeth commented on HADOOP-15676: - Thanks [~xiaochen]! > Cleanup TestSSLHttpServer > - > > Key: HADOOP-15676 > URL: https://issues.apache.org/jira/browse/HADOOP-15676 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 2.6.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Fix For: 3.2.0 > > Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, > HADOOP-15676.003.patch, HADOOP-15676.004.patch, HADOOP-15676.005.patch > > > This issue will fix: > * Several typos in this class > * Code is not very well readable in some of the places. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException
[ https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646871#comment-16646871 ] Szilard Nemeth commented on HADOOP-15717: - Hi [~xiaochen]! I checked the API within my IDE and on the webpage as well, I still don't see a signature to log the parameters with parameterized logging plus log the exception properly. The API provides either: 1. https://www.slf4j.org/apidocs/org/slf4j/Logger.html#error(java.lang.String,%20java.lang.Object...) or 2. https://www.slf4j.org/apidocs/org/slf4j/Logger.html#error(java.lang.String,%20java.lang.Throwable) The 1. is for parameterized logging, the 2. for log a string and the exception but there's no method for do all the things combined. > TGT renewal thread does not log IOException > --- > > Key: HADOOP-15717 > URL: https://issues.apache.org/jira/browse/HADOOP-15717 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15717.001.patch, HADOOP-15717.002.patch > > > I came across a case where tgt.getEndTime() was returned null and it resulted > in an NPE, this observation was popped out of a test suite execution on a > cluster. The reason for logging the {{IOException}} is that it helps to > troubleshoot what caused the exception, as it can come from two different > calls from the try-catch. > I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from > logging the fact that the ticket's {{endDate}} was null, we have not logged > the exception at all. > With the current code, the exception is swallowed and the thread terminates > in case the ticket's {{endDate}} is null. > As this can happen with OpenJDK for example, it is required to print the > exception (stack trace, message) to the log. > The code should be updated here: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15676) Cleanup TestSSLHttpServer
[ https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15676: Attachment: HADOOP-15676.005.patch > Cleanup TestSSLHttpServer > - > > Key: HADOOP-15676 > URL: https://issues.apache.org/jira/browse/HADOOP-15676 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 2.6.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, > HADOOP-15676.003.patch, HADOOP-15676.004.patch, HADOOP-15676.005.patch > > > This issue will fix: > * Several typos in this class > * Code is not very well readable in some of the places. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15676) Cleanup TestSSLHttpServer
[ https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646865#comment-16646865 ] Szilard Nemeth commented on HADOOP-15676: - Hi [~xiaochen]! Thanks for the explanation, it is indeed better to just let the exception thrown out from the method so we have more information in the logs. I'm with you when you mentioned improving the code if we touch it, since this is a cleanup jira, I think changing the things you mentioned makes sense. See the new patch with the fixes! Thanks! > Cleanup TestSSLHttpServer > - > > Key: HADOOP-15676 > URL: https://issues.apache.org/jira/browse/HADOOP-15676 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 2.6.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, > HADOOP-15676.003.patch, HADOOP-15676.004.patch, HADOOP-15676.005.patch > > > This issue will fix: > * Several typos in this class > * Code is not very well readable in some of the places. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15676) Cleanup TestSSLHttpServer
[ https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15676: Attachment: HADOOP-15676.004.patch > Cleanup TestSSLHttpServer > - > > Key: HADOOP-15676 > URL: https://issues.apache.org/jira/browse/HADOOP-15676 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 2.6.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, > HADOOP-15676.003.patch, HADOOP-15676.004.patch > > > This issue will fix: > * Several typos in this class > * Code is not very well readable in some of the places. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15717) TGT renewal thread does not log IOException
[ https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15717: Attachment: HADOOP-15717.002.patch > TGT renewal thread does not log IOException > --- > > Key: HADOOP-15717 > URL: https://issues.apache.org/jira/browse/HADOOP-15717 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15717.001.patch, HADOOP-15717.002.patch > > > I came across a case where tgt.getEndTime() was returned null and it resulted > in an NPE, this observation was popped out of a test suite execution on a > cluster. The reason for logging the {{IOException}} is that it helps to > troubleshoot what caused the exception, as it can come from two different > calls from the try-catch. > I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from > logging the fact that the ticket's {{endDate}} was null, we have not logged > the exception at all. > With the current code, the exception is swallowed and the thread terminates > in case the ticket's {{endDate}} is null. > As this can happen with OpenJDK for example, it is required to print the > exception (stack trace, message) to the log. > The code should be updated here: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException
[ https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644197#comment-16644197 ] Szilard Nemeth commented on HADOOP-15717: - Hi [~xiaochen], [~rkanter]! Oh I see what I had overlooked. Removed the newly added error log and modified the 2 existing error logs to contain the exception. Unfortunately, I had to use String.format, as there's no API from this version of log4j that would support object parameters and exception logging at the same time. Actually, on line 945, the code's intention was to log the exception, but as the signature of the log4j API call is different, it was never logged. The call had less format specifiers in the string, too (4 instead of 5). > TGT renewal thread does not log IOException > --- > > Key: HADOOP-15717 > URL: https://issues.apache.org/jira/browse/HADOOP-15717 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15717.001.patch > > > I came across a case where tgt.getEndTime() was returned null and it resulted > in an NPE, this observation was popped out of a test suite execution on a > cluster. The reason for logging the {{IOException}} is that it helps to > troubleshoot what caused the exception, as it can come from two different > calls from the try-catch. > I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from > logging the fact that the ticket's {{endDate}} was null, we have not logged > the exception at all. > With the current code, the exception is swallowed and the thread terminates > in case the ticket's {{endDate}} is null. > As this can happen with OpenJDK for example, it is required to print the > exception (stack trace, message) to the log. > The code should be updated here: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15676) Cleanup TestSSLHttpServer
[ https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616893#comment-16616893 ] Szilard Nemeth commented on HADOOP-15676: - Thanks [~xiaochen] for your comments! Uploaded a new patch that fixes the code duplication. With the try-catch block with the {{fail\(\)}} call, I haven't modified the original code. I guess the intention was to not only fail when the {{SSLHandshakeException}} is thrown, the test should fail and provide a more detailed error message (1st parameter to {{fail\(\)}}. What idea do you have in mind to fix that? Thanks! > Cleanup TestSSLHttpServer > - > > Key: HADOOP-15676 > URL: https://issues.apache.org/jira/browse/HADOOP-15676 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 2.6.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, > HADOOP-15676.003.patch > > > This issue will fix: > * Several typos in this class > * Code is not very well readable in some of the places. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15676) Cleanup TestSSLHttpServer
[ https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15676: Attachment: HADOOP-15676.003.patch > Cleanup TestSSLHttpServer > - > > Key: HADOOP-15676 > URL: https://issues.apache.org/jira/browse/HADOOP-15676 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 2.6.0 >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, > HADOOP-15676.003.patch > > > This issue will fix: > * Several typos in this class > * Code is not very well readable in some of the places. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException
[ https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616891#comment-16616891 ] Szilard Nemeth commented on HADOOP-15717: - Hi [~xiaochen]! Thanks for your comment! I'm not sure what 2 existing {{LOG.error}} statements you referred to. If you referred to the block {{catch \(IOException ie\) {}} in the {{run}} method, I would say those are not the best candidates to add the exception as a logging parameter, as the first {{LOG.error}} statement deals with the case when the {{tgt}} is destroyed, the second {{LOG.error}} statement handles possible NPEs coming from {{tgt.getEndTime\(\).getTime\(\)}}. It can happen you meant a different thing, please clarify! I'm still voting for the keeping my patch as a solution, so logging the exception on the first line of the catch-block in a generic fashion. > TGT renewal thread does not log IOException > --- > > Key: HADOOP-15717 > URL: https://issues.apache.org/jira/browse/HADOOP-15717 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15717.001.patch > > > I came across a case where tgt.getEndTime() was returned null and it resulted > in an NPE, this observation was popped out of a test suite execution on a > cluster. The reason for logging the {{IOException}} is that it helps to > troubleshoot what caused the exception, as it can come from two different > calls from the try-catch. > I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from > logging the fact that the ticket's {{endDate}} was null, we have not logged > the exception at all. > With the current code, the exception is swallowed and the thread terminates > in case the ticket's {{endDate}} is null. > As this can happen with OpenJDK for example, it is required to print the > exception (stack trace, message) to the log. > The code should be updated here: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException
[ https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605874#comment-16605874 ] Szilard Nemeth commented on HADOOP-15717: - Hi [~ste...@apache.org]! Good point. Unfortunately, I'm not confident how many times this could be logged. In this sense, I would use the debug level instead. Do you agree? > TGT renewal thread does not log IOException > --- > > Key: HADOOP-15717 > URL: https://issues.apache.org/jira/browse/HADOOP-15717 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15717.001.patch > > > I came across a case where tgt.getEndTime() was returned null and it resulted > in an NPE, this observation was popped out of a test suite execution on a > cluster. The reason for logging the {{IOException}} is that it helps to > troubleshoot what caused the exception, as it can come from two different > calls from the try-catch. > I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from > logging the fact that the ticket's {{endDate}} was null, we have not logged > the exception at all. > With the current code, the exception is swallowed and the thread terminates > in case the ticket's {{endDate}} is null. > As this can happen with OpenJDK for example, it is required to print the > exception (stack trace, message) to the log. > The code should be updated here: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException
[ https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603308#comment-16603308 ] Szilard Nemeth commented on HADOOP-15717: - Tests are not added, therefore we have the red build. > TGT renewal thread does not log IOException > --- > > Key: HADOOP-15717 > URL: https://issues.apache.org/jira/browse/HADOOP-15717 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15717.001.patch > > > I came across a case where tgt.getEndTime() was returned null and it resulted > in an NPE, this observation was popped out of a test suite execution on a > cluster. The reason for logging the {{IOException}} is that it helps to > troubleshoot what caused the exception, as it can come from two different > calls from the try-catch. > I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from > logging the fact that the ticket's {{endDate}} was null, we have not logged > the exception at all. > With the current code, the exception is swallowed and the thread terminates > in case the ticket's {{endDate}} is null. > As this can happen with OpenJDK for example, it is required to print the > exception (stack trace, message) to the log. > The code should be updated here: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15717) TGT renewal thread does not log IOException
[ https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15717: Status: Patch Available (was: In Progress) > TGT renewal thread does not log IOException > --- > > Key: HADOOP-15717 > URL: https://issues.apache.org/jira/browse/HADOOP-15717 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15717.001.patch > > > I came across a case where tgt.getEndTime() was returned null and it resulted > in an NPE, this observation was popped out of a test suite execution on a > cluster. The reason for logging the {{IOException}} is that it helps to > troubleshoot what caused the exception, as it can come from two different > calls from the try-catch. > I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from > logging the fact that the ticket's {{endDate}} was null, we have not logged > the exception at all. > With the current code, the exception is swallowed and the thread terminates > in case the ticket's {{endDate}} is null. > As this can happen with OpenJDK for example, it is required to print the > exception (stack trace, message) to the log. > The code should be updated here: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15717) TGT renewal thread does not log IOException
[ https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated HADOOP-15717: Description: The reason for logging the IOexception is that it helps troubleshooting what caused the exception, as it can come from two different calls from the try-catch. I came across a case where tgt.getEndTime() was returned null and it resulted in an NPE, this observation was popped out of a test suite execution on a cluster. I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from logging the fact that the ticket's {{endDate}} was null, we have not logged the exception at all. With the current code, the exception is swallowed and the thread terminates in case the ticket's {{endDate}} is null. As this can happen with OpenJDK for example, it is required to print the exception (stack trace, message) to the log. The code should be updated here: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 was: The reason for logging the IOexception is that it helps troubleshooting what caused the exception, as it can come from two different calls from the try-catch. I came across a case where tgt.getEndTime() was returned null and it resulted in an NPE. I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from logging the fact that the ticket's {{endDate}} was null, we have not logged the exception at all. With the current code, the exception is swallowed and the thread terminates in case the ticket's {{endDate}} is null. As this can happen with OpenJDK for example, it is required to print the exception (stack trace, message) to the log. The code should be updated here: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 > TGT renewal thread does not log IOException > --- > > Key: HADOOP-15717 > URL: https://issues.apache.org/jira/browse/HADOOP-15717 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Major > Attachments: HADOOP-15717.001.patch > > > The reason for logging the IOexception is that it helps troubleshooting what > caused the exception, as it can come from two different calls from the > try-catch. > I came across a case where tgt.getEndTime() was returned null and it resulted > in an NPE, this observation was popped out of a test suite execution on a > cluster. > I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from > logging the fact that the ticket's {{endDate}} was null, we have not logged > the exception at all. > With the current code, the exception is swallowed and the thread terminates > in case the ticket's {{endDate}} is null. > As this can happen with OpenJDK for example, it is required to print the > exception (stack trace, message) to the log. > The code should be updated here: > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org