[jira] [Updated] (HADOOP-18991) Remove commons-beanutils dependency from Hadoop 3

2023-12-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-18991:

Summary: Remove commons-beanutils dependency from Hadoop 3  (was: Remove 
commons-benautils dependency from Hadoop 3)

> Remove commons-beanutils dependency from Hadoop 3
> -
>
> Key: HADOOP-18991
> URL: https://issues.apache.org/jira/browse/HADOOP-18991
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Reporter: Istvan Toth
>Priority: Major
>
> Hadoop doesn't acually use it, and it pollutes the classpath of dependent 
> projects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18870) CURATOR-599 change broke functionality introduced in HADOOP-18139 and HADOOP-18709

2023-09-06 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved HADOOP-18870.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> CURATOR-599 change broke functionality introduced in HADOOP-18139 and 
> HADOOP-18709
> --
>
> Key: HADOOP-18870
> URL: https://issues.apache.org/jira/browse/HADOOP-18870
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Ferenc Erdelyi
>Assignee: Ferenc Erdelyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> [Curator PR#391 
> |https://github.com/apache/curator/pull/391/files#diff-687a4ed1252bfb4f56b3aeeb28bee4413b7df9bec4b969b72215587158ac875dR59]
>  introduced a default method in the ZooKeeperFactory interface, hence the 
> override of the 4-parameter NewZookeeper method in the HadoopZookeeperFactory 
> class is not taking effect due to this. 
> Proposing routing the 4-parameter method to a 5-parameter method, which 
> instantiates the ZKConfiguration as the 5th parameter. This is a non-breaking 
> change, as the ZKConfiguration is currently instantiated within the method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18870) CURATOR-599 change broke functionality introduced in HADOOP-18139 and HADOOP-18709

2023-09-06 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762534#comment-17762534
 ] 

Szilard Nemeth commented on HADOOP-18870:
-

[~bender]

{quote}
Proposing routing the 4-parameter method to a 5-parameter method, which 
instantiates the ZKConfiguration as the 5th parameter. This is a non-breaking 
change, as the ZKConfiguration is currently instantiated within the method.
{quote}

Am I missing something or you meant `ZKClientConfig` as the 5th parameter, 
right?
Checking the linked Curator PR 
(https://github.com/apache/curator/pull/391/files#diff-687a4ed1252bfb4f56b3aeeb28bee4413b7df9bec4b969b72215587158ac875dR59)
 shows me ZKClientConfig as the 5th parameter there.
Can you fix the description of the jira ?

> CURATOR-599 change broke functionality introduced in HADOOP-18139 and 
> HADOOP-18709
> --
>
> Key: HADOOP-18870
> URL: https://issues.apache.org/jira/browse/HADOOP-18870
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Ferenc Erdelyi
>Assignee: Ferenc Erdelyi
>Priority: Major
>  Labels: pull-request-available
>
> [Curator PR#391 
> |https://github.com/apache/curator/pull/391/files#diff-687a4ed1252bfb4f56b3aeeb28bee4413b7df9bec4b969b72215587158ac875dR59]
>  introduced a default method in the ZooKeeperFactory interface, hence the 
> override of the 4-parameter NewZookeeper method in the HadoopZookeeperFactory 
> class is not taking effect due to this. 
> Proposing routing the 4-parameter method to a 5-parameter method, which 
> instantiates the ZKConfiguration as the 5th parameter. This is a non-breaking 
> change, as the ZKConfiguration is currently instantiated within the method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18709) Add curator based ZooKeeper communication support over SSL/TLS into the common library

2023-06-04 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-18709:

Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> Add curator based ZooKeeper communication support over SSL/TLS into the 
> common library
> --
>
> Key: HADOOP-18709
> URL: https://issues.apache.org/jira/browse/HADOOP-18709
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ferenc Erdelyi
>Assignee: Ferenc Erdelyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> With HADOOP-16579 the ZooKeeper client is capable of securing communication 
> with SSL. 
> To follow the convention introduced in HADOOP-14741, proposing to add to the 
> core-default.xml the following configurations, as the groundwork for the 
> components to enable encrypted communication between the individual 
> components and ZooKeeper:
>  * hadoop.zk.ssl.keystore.location
>  * hadoop.zk.ssl.keystore.password
>  * hadoop.zk.ssl.truststore.location
>  * hadoop.zk.ssl.truststore.password
> These parameters along with the component-specific ssl.client.enable option 
> (e.g. yarn.zookeeper.ssl.client.enable) should be passed to the 
> ZKCuratorManager to build the CuratorFramework. The ZKCuratorManager needs a 
> new overloaded start() method to build the encrypted communication.
>  * The secured ZK Client uses Netty, hence the dependency is included in the 
> pom.xml. Added netty-handler and netty-transport-native-epoll dependency to 
> the pom.xml based on ZOOKEEPER-3494 - "No need to depend on netty-all (SSL)".
>  * The change was exclusively tested with the unit test, which is a kind of 
> integration test, as a ZK Server was brought up and the communication tested 
> between the client and the server.
>  * This code change is in the common code base and there is no component 
> calling it yet. Once YARN-11468 - "Zookeeper SSL/TLS support" is implemented, 
> we can test it in a real cluster environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18709) Add curator based ZooKeeper communication support over SSL/TLS into the common library

2023-06-04 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-18709:

Fix Version/s: 3.4.0

> Add curator based ZooKeeper communication support over SSL/TLS into the 
> common library
> --
>
> Key: HADOOP-18709
> URL: https://issues.apache.org/jira/browse/HADOOP-18709
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ferenc Erdelyi
>Assignee: Ferenc Erdelyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> With HADOOP-16579 the ZooKeeper client is capable of securing communication 
> with SSL. 
> To follow the convention introduced in HADOOP-14741, proposing to add to the 
> core-default.xml the following configurations, as the groundwork for the 
> components to enable encrypted communication between the individual 
> components and ZooKeeper:
>  * hadoop.zk.ssl.keystore.location
>  * hadoop.zk.ssl.keystore.password
>  * hadoop.zk.ssl.truststore.location
>  * hadoop.zk.ssl.truststore.password
> These parameters along with the component-specific ssl.client.enable option 
> (e.g. yarn.zookeeper.ssl.client.enable) should be passed to the 
> ZKCuratorManager to build the CuratorFramework. The ZKCuratorManager needs a 
> new overloaded start() method to build the encrypted communication.
>  * The secured ZK Client uses Netty, hence the dependency is included in the 
> pom.xml. Added netty-handler and netty-transport-native-epoll dependency to 
> the pom.xml based on ZOOKEEPER-3494 - "No need to depend on netty-all (SSL)".
>  * The change was exclusively tested with the unit test, which is a kind of 
> integration test, as a ZK Server was brought up and the communication tested 
> between the client and the server.
>  * This code change is in the common code base and there is no component 
> calling it yet. Once YARN-11468 - "Zookeeper SSL/TLS support" is implemented, 
> we can test it in a real cluster environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18709) Add curator based ZooKeeper communication support over SSL/TLS into the common library

2023-06-04 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-18709:

Status: Patch Available  (was: Open)

> Add curator based ZooKeeper communication support over SSL/TLS into the 
> common library
> --
>
> Key: HADOOP-18709
> URL: https://issues.apache.org/jira/browse/HADOOP-18709
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ferenc Erdelyi
>Assignee: Ferenc Erdelyi
>Priority: Major
>  Labels: pull-request-available
>
> With HADOOP-16579 the ZooKeeper client is capable of securing communication 
> with SSL. 
> To follow the convention introduced in HADOOP-14741, proposing to add to the 
> core-default.xml the following configurations, as the groundwork for the 
> components to enable encrypted communication between the individual 
> components and ZooKeeper:
>  * hadoop.zk.ssl.keystore.location
>  * hadoop.zk.ssl.keystore.password
>  * hadoop.zk.ssl.truststore.location
>  * hadoop.zk.ssl.truststore.password
> These parameters along with the component-specific ssl.client.enable option 
> (e.g. yarn.zookeeper.ssl.client.enable) should be passed to the 
> ZKCuratorManager to build the CuratorFramework. The ZKCuratorManager needs a 
> new overloaded start() method to build the encrypted communication.
>  * The secured ZK Client uses Netty, hence the dependency is included in the 
> pom.xml. Added netty-handler and netty-transport-native-epoll dependency to 
> the pom.xml based on ZOOKEEPER-3494 - "No need to depend on netty-all (SSL)".
>  * The change was exclusively tested with the unit test, which is a kind of 
> integration test, as a ZK Server was brought up and the communication tested 
> between the client and the server.
>  * This code change is in the common code base and there is no component 
> calling it yet. Once YARN-11468 - "Zookeeper SSL/TLS support" is implemented, 
> we can test it in a real cluster environment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18732) Exclude Jettison from jersey-json artifact in hadoop-yarn-common's pom.xml

2023-05-05 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-18732:

Summary: Exclude Jettison from jersey-json artifact in hadoop-yarn-common's 
pom.xml  (was: Exclude Jettison from jersery-json artifact in 
hadoop-yarn-common's pom.xml)

> Exclude Jettison from jersey-json artifact in hadoop-yarn-common's pom.xml
> --
>
> Key: HADOOP-18732
> URL: https://issues.apache.org/jira/browse/HADOOP-18732
> Project: Hadoop Common
>  Issue Type: Task
>  Components: build
>Reporter: Devaspati Krishnatri
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18602) Remove netty3 dependency

2023-01-27 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-18602:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Remove netty3 dependency
> 
>
> Key: HADOOP-18602
> URL: https://issues.apache.org/jira/browse/HADOOP-18602
> Project: Hadoop Common
>  Issue Type: Task
>  Components: build
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> AFAIK netty3 is no longer in use so it can be removed from the dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18602) Remove netty3 dependency

2023-01-26 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-18602:

Status: Patch Available  (was: Open)

> Remove netty3 dependency
> 
>
> Key: HADOOP-18602
> URL: https://issues.apache.org/jira/browse/HADOOP-18602
> Project: Hadoop Common
>  Issue Type: Task
>  Components: build
>Affects Versions: 3.4.0
>Reporter: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
>
> AFAIK netty3 is no longer in use so it can be removed from the dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2022-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631029#comment-17631029
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 11/9/22 12:07 PM:
---

Hi,
CC: [~gandras], [~shuzirra], [~weichiu]

Let me summarize what kind of testing I performed to make sure this change 
won't cause any regression.
The project that helped me very much with the testing is called 
[Hades|https://github.com/9uapaw/hades].
Kudos to [~gandras] for the initial work on the Hades project.
h1. TL;DR

*Hades was the framework I used to run my testcases.*
*All testcases are passed both with the trunk version of Hadoop (this is not 
surprising at all) and the deployed Hadoop version with my Netty upgrade patch.*
*See the attached test logs for details.*  [^hades-results-20221108.zip] 
*Also see the details below about what Hades is, how I tested, why I chose 
certain configurations for the testcases and many more..*
*Now I'm pretty confident that this patch won't break anything so I'm waiting 
for reviewers.*

h1. HADES IN GENERAL
h2. What is Hades?

Hades is a CLI tool, that shares a common interface between various Hadoop 
distributions. It is a collection of commands most frequently used by 
developers of Hadoop components.

Hades supports [Hadock|https://github.com/9uapaw/docker-hadoop-dev], [Cloduera 
Data Platform|https://www.cloudera.com/products/cloudera-data-platform.html] 
and standard upstream distribution.
h2. Basic features of Hades
 - Discover cluster: Stores where individual YARN / HDFS daemons are running.
 - Distribute files on certain nodes
 - Get config: Prints configuration of selected roles
 - Read logs of Hadoop roles
 - Restart: Restarting of certain roles
 - Run an application on the defined cluster
 - Status: Prints the status of the cluster
 - Update config: Update properties on a config file for selected roles
 - YARN specific commands
 - Run script: Runs user-defined custom scripts against the cluster.

h1. CLUSTER + HADES SETUP
h2. Run Hades with the Netty testing script against a cluster

First of all, I created a standard cluster and deployed Hadoop to the cluster.
Side note: Later on, all the installation that deploys Hadoop on the cluster 
could be part of Hades as well.

It's worth to be mentioned that I have a [PR with netty-related 
changes|https://github.com/9uapaw/hades/pull/6] against the Hades repo.
The branch of this PR is 
[this|https://github.com/szilard-nemeth/hades/tree/netty4-finish].

[Here are the 
instructions|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/README.md#set-up-hades-on-a-cluster-and-run-the-netty-script]
 for how to set up and run Hades with the Netty testing script.
h1. THE NETTY TESTING SCRIPT

The Netty testing script [lives 
here|https://github.com/szilard-nemeth/hades/blob/netty4-finish/script/netty4.py].
As you can see on the code, quite a lot of work has been done to make sure the 
Netty 4 upgrade won't break anything and won't cause any regression as it is a 
crucial part of MapReduce.
h2. CONCEPTS
h3. Test context

Class: Netty4TestContext

The test context provides a way to encapsulate a base branch and a patch file 
(if any) applied on top of the base branch.
The context can enable or disable Maven compilation.
The context can also have certain ways to ensure that the compilation and the 
deployment of new jars were successful on the cluster.
Now, it can verify that certain logs are appearing in the daemon logs, making 
sure the deployment was okay.
The main purpose of the context is to compare it with results of other contexts.
For the Netty testing, it was evident that I need to make sure the trunk 
version and my version with the patch applied on top of trnuk works the same, 
e.g. there's no regression.
For this, I created the context.
h3. Testcase

Class: Netty4Testcase

In general, a testcase can have a name, a simple name, some config changes 
(dictionary of string keys, string values) and one MR application.
h3. Test config: Config options for running the tests

Class: Netty4TestConfig

These are the main config options for the Netty testing.
I won't go into too much details as I defined a ton of options along the way.
You can check all the config options 
[here|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/script/netty4.py#L655-L687]
h3. Compiler

As mentioned above, Hades can compile Hadoop with Maven and replace the changed 
jars / Maven modules on the cluster.
This is particularly useful for the Netty testing as I was interested in 
whether the patch causes any issues so I had to compile Hadoop with my Netty 
patch, deploy the jars on the cluster and run all the tests and see all of them 
passing.
h2. TESTCASES

The testcases are defined with the help of the Netty4TestcasesBuilder. You can 
find all the testcases 

[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2022-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631029#comment-17631029
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

Hi,
CC: [~gandras], [~shuzirra], [~weichiu]

Let me summarize what kind of testing I performed to make sure this change 
won't cause any regression.
The project that helped me very much with the testing is called 
[Hades|https://github.com/9uapaw/hades].
Kudos to [~gandras] for the initial work on the Hades project.
h1. TL;DR

*Hades was the framework I used to run my testcases.*
*All testcases are passed both with the trunk version of Hadoop (this is not 
surprising at all) and the deployed Hadoop version with my Netty upgrade patch.*
*See the attached test logs for details.*
*Also see the details below about what Hades is, how I tested, why I chose 
certain configurations for the testcases and many more..*
*Now I'm pretty confident that this patch won't break anything so I'm waiting 
for reviewers.*

h1. HADES IN GENERAL
h2. What is Hades?

Hades is a CLI tool, that shares a common interface between various Hadoop 
distributions. It is a collection of commands most frequently used by 
developers of Hadoop components.

Hades supports [Hadock|https://github.com/9uapaw/docker-hadoop-dev], [Cloduera 
Data Platform|https://www.cloudera.com/products/cloudera-data-platform.html] 
and standard upstream distribution.
h2. Basic features of Hades
 - Discover cluster: Stores where individual YARN / HDFS daemons are running.
 - Distribute files on certain nodes
 - Get config: Prints configuration of selected roles
 - Read logs of Hadoop roles
 - Restart: Restarting of certain roles
 - Run an application on the defined cluster
 - Status: Prints the status of the cluster
 - Update config: Update properties on a config file for selected roles
 - YARN specific commands
 - Run script: Runs user-defined custom scripts against the cluster.

h1. CLUSTER + HADES SETUP
h2. Run Hades with the Netty testing script against a cluster

First of all, I created a standard cluster and deployed Hadoop to the cluster.
Side note: Later on, all the installation that deploys Hadoop on the cluster 
could be part of Hades as well.

It's worth to be mentioned that I have a [PR with netty-related 
changes|https://github.com/9uapaw/hades/pull/6] against the Hades repo.
The branch of this PR is 
[this|https://github.com/szilard-nemeth/hades/tree/netty4-finish].

[Here are the 
instructions|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/README.md#set-up-hades-on-a-cluster-and-run-the-netty-script]
 for how to set up and run Hades with the Netty testing script.
h1. THE NETTY TESTING SCRIPT

The Netty testing script [lives 
here|https://github.com/szilard-nemeth/hades/blob/netty4-finish/script/netty4.py].
As you can see on the code, quite a lot of work has been done to make sure the 
Netty 4 upgrade won't break anything and won't cause any regression as it is a 
crucial part of MapReduce.
h2. CONCEPTS
h3. Test context

Class: Netty4TestContext

The test context provides a way to encapsulate a base branch and a patch file 
(if any) applied on top of the base branch.
The context can enable or disable Maven compilation.
The context can also have certain ways to ensure that the compilation and the 
deployment of new jars were successful on the cluster.
Now, it can verify that certain logs are appearing in the daemon logs, making 
sure the deployment was okay.
The main purpose of the context is to compare it with results of other contexts.
For the Netty testing, it was evident that I need to make sure the trunk 
version and my version with the patch applied on top of trnuk works the same, 
e.g. there's no regression.
For this, I created the context.
h3. Testcase

Class: Netty4Testcase

In general, a testcase can have a name, a simple name, some config changes 
(dictionary of string keys, string values) and one MR application.
h3. Test config: Config options for running the tests

Class: Netty4TestConfig

These are the main config options for the Netty testing.
I won't go into too much details as I defined a ton of options along the way.
You can check all the config options 
[here|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/script/netty4.py#L655-L687]
h3. Compiler

As mentioned above, Hades can compile Hadoop with Maven and replace the changed 
jars / Maven modules on the cluster.
This is particularly useful for the Netty testing as I was interested in 
whether the patch causes any issues so I had to compile Hadoop with my Netty 
patch, deploy the jars on the cluster and run all the tests and see all of them 
passing.
h2. TESTCASES

The testcases are defined with the help of the Netty4TestcasesBuilder. You can 
find all the testcases 

[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2022-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17631029#comment-17631029
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 11/9/22 12:05 PM:
---

Hi,
CC: [~gandras], [~shuzirra], [~weichiu]

Let me summarize what kind of testing I performed to make sure this change 
won't cause any regression.
The project that helped me very much with the testing is called 
[Hades|https://github.com/9uapaw/hades].
Kudos to [~gandras] for the initial work on the Hades project.
h1. TL;DR

*Hades was the framework I used to run my testcases.*
*All testcases are passed both with the trunk version of Hadoop (this is not 
surprising at all) and the deployed Hadoop version with my Netty upgrade patch.*
*See the attached test logs for details.*
*Also see the details below about what Hades is, how I tested, why I chose 
certain configurations for the testcases and many more..*
*Now I'm pretty confident that this patch won't break anything so I'm waiting 
for reviewers.*

h1. HADES IN GENERAL
h2. What is Hades?

Hades is a CLI tool, that shares a common interface between various Hadoop 
distributions. It is a collection of commands most frequently used by 
developers of Hadoop components.

Hades supports [Hadock|https://github.com/9uapaw/docker-hadoop-dev], [Cloduera 
Data Platform|https://www.cloudera.com/products/cloudera-data-platform.html] 
and standard upstream distribution.
h2. Basic features of Hades
 - Discover cluster: Stores where individual YARN / HDFS daemons are running.
 - Distribute files on certain nodes
 - Get config: Prints configuration of selected roles
 - Read logs of Hadoop roles
 - Restart: Restarting of certain roles
 - Run an application on the defined cluster
 - Status: Prints the status of the cluster
 - Update config: Update properties on a config file for selected roles
 - YARN specific commands
 - Run script: Runs user-defined custom scripts against the cluster.

h1. CLUSTER + HADES SETUP
h2. Run Hades with the Netty testing script against a cluster

First of all, I created a standard cluster and deployed Hadoop to the cluster.
Side note: Later on, all the installation that deploys Hadoop on the cluster 
could be part of Hades as well.

It's worth to be mentioned that I have a [PR with netty-related 
changes|https://github.com/9uapaw/hades/pull/6] against the Hades repo.
The branch of this PR is 
[this|https://github.com/szilard-nemeth/hades/tree/netty4-finish].

[Here are the 
instructions|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/README.md#set-up-hades-on-a-cluster-and-run-the-netty-script]
 for how to set up and run Hades with the Netty testing script.
h1. THE NETTY TESTING SCRIPT

The Netty testing script [lives 
here|https://github.com/szilard-nemeth/hades/blob/netty4-finish/script/netty4.py].
As you can see on the code, quite a lot of work has been done to make sure the 
Netty 4 upgrade won't break anything and won't cause any regression as it is a 
crucial part of MapReduce.
h2. CONCEPTS
h3. Test context

Class: Netty4TestContext

The test context provides a way to encapsulate a base branch and a patch file 
(if any) applied on top of the base branch.
The context can enable or disable Maven compilation.
The context can also have certain ways to ensure that the compilation and the 
deployment of new jars were successful on the cluster.
Now, it can verify that certain logs are appearing in the daemon logs, making 
sure the deployment was okay.
The main purpose of the context is to compare it with results of other contexts.
For the Netty testing, it was evident that I need to make sure the trunk 
version and my version with the patch applied on top of trnuk works the same, 
e.g. there's no regression.
For this, I created the context.
h3. Testcase

Class: Netty4Testcase

In general, a testcase can have a name, a simple name, some config changes 
(dictionary of string keys, string values) and one MR application.
h3. Test config: Config options for running the tests

Class: Netty4TestConfig

These are the main config options for the Netty testing.
I won't go into too much details as I defined a ton of options along the way.
You can check all the config options 
[here|https://github.com/szilard-nemeth/hades/blob/c16e95393ecf3e787e125c58d88ec2dc6a44b9e0/script/netty4.py#L655-L687]
h3. Compiler

As mentioned above, Hades can compile Hadoop with Maven and replace the changed 
jars / Maven modules on the cluster.
This is particularly useful for the Netty testing as I was interested in 
whether the patch causes any issues so I had to compile Hadoop with my Netty 
patch, deploy the jars on the cluster and run all the tests and see all of them 
passing.
h2. TESTCASES

The testcases are defined with the help of the Netty4TestcasesBuilder. You can 
find all the testcases 

[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2022-11-09 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: hades-results-20221108.zip

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, 
> HADOOP-15327.005.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, 
> hades-results-20221108.zip, testfailure-testMapFileAccess-emptyresponse.zip, 
> testfailure-testReduceFromPartialMem.zip
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18229) Fix Hadoop Common Java Doc Error

2022-05-10 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-18229:

Description: 
I found that when hadoop-multibranch compiled PR-4266, some errors would pop 
up, I tried to solve it

The wrong compilation information is as follows, I try to fix the Error 
information


{code:java}
[ERROR] 
/home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:432:
 error: exception not thrown: java.io.IOException
[ERROR]* @throws IOException
[ERROR]  ^
[ERROR] 
/home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885:
 error: unknown tag: username
[ERROR]*  E.g. link: ^/user/(?\\w+) => 
s3://$user.apache.com/_${user}
[ERROR]   ^
[ERROR] 
/home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885:
 error: bad use of '>'
[ERROR]*  E.g. link: ^/user/(?\\w+) => 
s3://$user.apache.com/_${user}
[ERROR]^
[ERROR] 
/home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:910:
 error: unknown tag: username
[ERROR]* .linkRegex.replaceresolveddstpath:_:-#.^/user/(?\w+)
{code}


  was:
I found that when hadoop-multibranch compiled PR-4266, some errors would pop 
up, I tried to solve it

The wrong compilation information is as follows, I try to fix the Error 
information

[ERROR] 
/home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:432:
 error: exception not thrown: java.io.IOException
[ERROR]* @throws IOException
[ERROR]  ^
[ERROR] 
/home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885:
 error: unknown tag: username
[ERROR]*  E.g. link: ^/user/(?\\w+) => 
s3://$user.apache.com/_${user}
[ERROR]   ^
[ERROR] 
/home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885:
 error: bad use of '>'
[ERROR]*  E.g. link: ^/user/(?\\w+) => 
s3://$user.apache.com/_${user}
[ERROR]^
[ERROR] 
/home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:910:
 error: unknown tag: username
[ERROR]* .linkRegex.replaceresolveddstpath:_:-#.^/user/(?\w+)


> Fix Hadoop Common Java Doc Error
> 
>
> Key: HADOOP-18229
> URL: https://issues.apache.org/jira/browse/HADOOP-18229
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> I found that when hadoop-multibranch compiled PR-4266, some errors would pop 
> up, I tried to solve it
> The wrong compilation information is as follows, I try to fix the Error 
> information
> {code:java}
> [ERROR] 
> /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:432:
>  error: exception not thrown: java.io.IOException
> [ERROR]* @throws IOException
> [ERROR]  ^
> [ERROR] 
> /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885:
>  error: unknown tag: username
> [ERROR]*  E.g. link: ^/user/(?\\w+) => 
> s3://$user.apache.com/_${user}
> [ERROR]   ^
> [ERROR] 
> /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:885:
>  error: bad use of '>'
> [ERROR]*  E.g. link: ^/user/(?\\w+) => 
> s3://$user.apache.com/_${user}
> [ERROR]^
> [ERROR] 
> /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-4266/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/InodeTree.java:910:
>  

[jira] [Updated] (HADOOP-18222) Prevent DelegationTokenSecretManagerMetrics from registering multiple times

2022-05-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-18222:

Status: Patch Available  (was: Open)

> Prevent DelegationTokenSecretManagerMetrics from registering multiple times 
> 
>
> Key: HADOOP-18222
> URL: https://issues.apache.org/jira/browse/HADOOP-18222
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> After committing HADOOP-18167, we received reports of the following error 
> when ResourceManager is initialized:
> {noformat}
> Caused by: java.io.IOException: Problem starting http server
> at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1389)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:475)
> ... 4 more
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source 
> DelegationTokenSecretManagerMetrics already exists!
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
> at 
> org.apache.hadoop.metrics2.MetricsSystem.register(MetricsSystem.java:71)
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$DelegationTokenSecretManagerMetrics.create(AbstractDelegationTokenSecretManager.java:878)
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.(AbstractDelegationTokenSecretManager.java:152)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$DelegationTokenSecretManager.(DelegationTokenManager.java:72)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.(DelegationTokenManager.java:122)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.initTokenManager(DelegationTokenAuthenticationHandler.java:161)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.init(DelegationTokenAuthenticationHandler.java:130)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.initializeAuthHandler(DelegationTokenAuthenticationFilter.java:214)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:180)
> at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.init(RMAuthenticationFilter.java:53){noformat}
> This can happen if MetricsSystemImpl#init is called and multiple metrics are 
> registered with the same name. A proposed solution is to declare the metrics 
> in AbstractDelegationTokenSecretManager as singleton, which would prevent 
> multiple instances DelegationTokenSecretManagerMetrics from being registered.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18222) Prevent DelegationTokenSecretManagerMetrics from registering multiple times

2022-05-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned HADOOP-18222:
---

Assignee: Hector Sandoval Chaverri

> Prevent DelegationTokenSecretManagerMetrics from registering multiple times 
> 
>
> Key: HADOOP-18222
> URL: https://issues.apache.org/jira/browse/HADOOP-18222
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> After committing HADOOP-18167, we received reports of the following error 
> when ResourceManager is initialized:
> {noformat}
> Caused by: java.io.IOException: Problem starting http server
> at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1389)
> at 
> org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:475)
> ... 4 more
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics source 
> DelegationTokenSecretManagerMetrics already exists!
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
> at 
> org.apache.hadoop.metrics2.MetricsSystem.register(MetricsSystem.java:71)
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$DelegationTokenSecretManagerMetrics.create(AbstractDelegationTokenSecretManager.java:878)
> at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.(AbstractDelegationTokenSecretManager.java:152)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$DelegationTokenSecretManager.(DelegationTokenManager.java:72)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.(DelegationTokenManager.java:122)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.initTokenManager(DelegationTokenAuthenticationHandler.java:161)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.init(DelegationTokenAuthenticationHandler.java:130)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.initializeAuthHandler(AuthenticationFilter.java:194)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.initializeAuthHandler(DelegationTokenAuthenticationFilter.java:214)
> at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(AuthenticationFilter.java:180)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.init(DelegationTokenAuthenticationFilter.java:180)
> at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.init(RMAuthenticationFilter.java:53){noformat}
> This can happen if MetricsSystemImpl#init is called and multiple metrics are 
> registered with the same name. A proposed solution is to declare the metrics 
> in AbstractDelegationTokenSecretManager as singleton, which would prevent 
> multiple instances DelegationTokenSecretManagerMetrics from being registered.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-10-18 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430075#comment-17430075
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 10/18/21, 3:59 PM:


Hi [~zhenshan.wen] ,

I'm planning to fix the maven shading issue in the coming weeks, as soon as 
possible.

Also, I'd appreciate if you could help me to find out what to fix to get rid of 
the Maven shading issue. 

Same thing I asked for here: 
[https://github.com/apache/hadoop/pull/3259#issuecomment-945923248]


was (Author: snemeth):
Hi [~zhenshan.wen] ,

I'm planning to fix the maven shading issue in the coming weeks, as soon as 
possible.

Also, I'd appreciate if you could help me to find out what to fix to get rid of 
the Maven shading issue. 

Same thing I asked here: 
https://github.com/apache/hadoop/pull/3259#issuecomment-945923248

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, 
> HADOOP-15327.005.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, 
> testfailure-testMapFileAccess-emptyresponse.zip, 
> testfailure-testReduceFromPartialMem.zip
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-10-18 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430075#comment-17430075
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 10/18/21, 3:59 PM:


Hi [~zhenshan.wen] ,

I'm planning to fix the maven shading issue in the coming weeks, as soon as 
possible.

Also, I'd appreciate if you could help me to find out what to fix to get rid of 
the Maven shading issue. 

Same thing I asked here: 
https://github.com/apache/hadoop/pull/3259#issuecomment-945923248


was (Author: snemeth):
Hi [~zhenshan.wen] ,

I'm planning to fix the maven shading issue in the coming weeks, as soon as 
possible.

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, 
> HADOOP-15327.005.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, 
> testfailure-testMapFileAccess-emptyresponse.zip, 
> testfailure-testReduceFromPartialMem.zip
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-10-18 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430075#comment-17430075
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

Hi [~zhenshan.wen] ,

I'm planning to fix the maven shading issue in the coming weeks, as soon as 
possible.

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, 
> HADOOP-15327.005.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, 
> testfailure-testMapFileAccess-emptyresponse.zip, 
> testfailure-testReduceFromPartialMem.zip
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17919) Fix command line example in Hadoop Cluster Setup documentation

2021-09-19 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-17919:

Description: 
About Hadoop cluster setup documentation 
([https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html])

The option  is specified in the following example, but HDFS 
command ignores it.
{noformat}
`[hdfs]$ $HADOOP_HOME/bin/hdfs namenode -format `
{noformat}

  was:
About Hdoop cluster setup documentation 
([https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html])

The option  is specified in the following example, but HDFS 
command ignores it.
{noformat}
`[hdfs]$ $HADOOP_HOME/bin/hdfs namenode -format `
{noformat}


> Fix command line example in Hadoop Cluster Setup documentation
> --
>
> Key: HADOOP-17919
> URL: https://issues.apache.org/jira/browse/HADOOP-17919
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Rintaro Ikeda
>Assignee: Rintaro Ikeda
>Priority: Minor
>  Labels: docuentation, pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> About Hadoop cluster setup documentation 
> ([https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html])
> The option  is specified in the following example, but HDFS 
> command ignores it.
> {noformat}
> `[hdfs]$ $HADOOP_HOME/bin/hdfs namenode -format `
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-09-08 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412011#comment-17412011
 ] 

Szilard Nemeth commented on HADOOP-17857:
-

Hi [~epayne],
Just came into my mind that there's no documentation update for this change in 
the commit but I already committed it. 
Would you mind reporting a follow-up jira for some doc changes? 

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-17857:

Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-09-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-17857:

Fix Version/s: 3.4.0

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17857) Check real user ACLs in addition to proxied user ACLs

2021-09-08 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412007#comment-17412007
 ] 

Szilard Nemeth commented on HADOOP-17857:
-

Thanks [~epayne] for working on this,
Just read through the description and comments, everything is clear for me and 
I like the simplistic way of solving this problem.
It's also reassuring that you have been running with this change in production 
for over a year.
So, latest patch looks to me and committed patch002 to trunk.

Resolving this jira, if you want to backport to older branches (3.3 or even 
3.2), please reopen.
Thanks.

> Check real user ACLs in addition to proxied user ACLs
> -
>
> Key: HADOOP-17857
> URL: https://issues.apache.org/jira/browse/HADOOP-17857
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.2.2, 2.10.1, 3.3.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: HADOOP-17857.001.patch, HADOOP-17857.002.patch
>
>
> In a secure cluster, it is possible to configure the services to allow a 
> super-user to proxy to a regular user and perform actions on behalf of the 
> proxied user (see [Proxy user - Superusers Acting On Behalf Of Other 
> Users|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]).
> This is useful for automating server access for multiple different users in a 
> multi-tenant cluster. For example, this can be used by a super user 
> submitting jobs to a YARN queue, accessing HDFS files, scheduling Oozie 
> workflows, etc, which will then execute the service as the proxied user.
> Usually when these services check ACLs to determine if the user has access to 
> the requested resources, the service only needs to check the ACLs for the 
> proxied user. However, it is sometimes desirable to allow the proxied user to 
> have access to the resources when only the real user has open ACLs.
> For instance, let's say the user {{adm}} is the only user with submit ACLs to 
> the {{dataload}} queue, and the {{adm}} user wants to submit apps to the 
> {{dataload}} queue on behalf of users {{headless1}} and {{headless2}}. In 
> addition, we want to be able to bill {{headless1}} and {{headless2}} 
> separately for the YARN resources used in the {{dataload}} queue. In order to 
> do this, the apps need to run in the {{dataload}} queue as the respective 
> headless users. We could open up the ACLs to the {{dataload}} queue to allow 
> {{headless1}} and {{headless2}} to submit apps. But this would allow those 
> users to submit any app to that queue, and not be limited to just the data 
> loading apps, and we don't trust the {{headless1}} and {{headless2}} owners 
> to honor that restriction.
> This JIRA proposes that we define a way to set up ACLs to restrict a 
> resource's access to a  super-user, but when the access happens, run it as 
> the proxied user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-08-04 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393023#comment-17393023
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

Converted to a PR, no more patches will be uploaded to this jira.

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, 
> HADOOP-15327.005.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, 
> testfailure-testMapFileAccess-emptyresponse.zip, 
> testfailure-testReduceFromPartialMem.zip
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17791) TestActivitiesManager is flaky

2021-07-05 Thread Szilard Nemeth (Jira)
Szilard Nemeth created HADOOP-17791:
---

 Summary: TestActivitiesManager is flaky
 Key: HADOOP-17791
 URL: https://issues.apache.org/jira/browse/HADOOP-17791
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Szilard Nemeth


I noticed in our internal testing environment that  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.TestActivitiesManager.testAppActivitiesTTL
 failed a couple of times, quite randomly.

By checking the Jira and searching for the name of the class, there are some 
results from this year as well: 
[https://issues.apache.org/jira/issues/?jql=text%20~%20TestActivitiesManager%20ORDER%20BY%20updated%20DESC]

I don't know exactly how to reproduce this though.

Tries to run the whole test class 60 times and it hasn't failed.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-07-01 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: HADOOP-15327.005.patch

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, 
> HADOOP-15327.005.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, 
> testfailure-testMapFileAccess-emptyresponse.zip, 
> testfailure-testReduceFromPartialMem.zip
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:23 PM:
---

Just uploaded a new patch: [^HADOOP-15327.005.patch]

I have been (almost) exclusively working on this since my last comment 
(https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367)
 and there are a couple of things to add again.
 The last commit that was discussed is this: 
[https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62]
 Let me explain what've changed commit by commit. I will skip a bunch of 
trivial ones like code cleanup, added comments and the like.
 *I will cover the test failures surfaced by Jenkins build / unit test results:*
 - Build #1: 
https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456
 - Build #2: 
https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928

*1. TestShuffleHandler: Introduced InputStreamReadResult that stores response 
as string + total bytes read: 
[https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]*
 Here, I added a new class called 'InputStreamReadResult' that stores the bytes 
read (byte[]) and the number of bytes read from a response InputStream.
 This improves the way testcases can assert on these data.

*2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: 
[https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]*
 It was a common pitfall while debugging that the tests had to modified to use 
a certain fixed port. Here, I added a constant to store the port number so when 
I had to debug I only needed to change it in one single place.

*3. Create class: TestExecution: Configure proxy, keep alive connection 
timeout: 
[https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]*
 In order to debug the HTTP responses, I found it convenient to add a helper 
class that is responsible for the following:
 - Configuring the HTTP connections, use a proxy when required
 - Increase the keepalive timeout when using DEBUG mode

TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit 
test setup method.
 There are 2 flags that control the behaviour of this object:
{code:java}
//Control test execution properties with these flags
private static final boolean DEBUG_MODE = true;
//If this is set to true and proxy server is not running, tests will fail!
private static final boolean USE_PROXY = false; 
{code}
The only difference on top of these is in the code of testcases: They create 
all HTTP connections with:
{code:java}
TEST_EXECUTION.openConnection(url)
{code}
*4. TestExecution: Configure port: 
[https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]*
 One addition to 3. is to include the port used by ShuffleHandler in the 
TestExecution object. When using DEBUG mode, the port is fixed to a value, 
otherwise it is set to 0, meaning that the port will be dynamically chosen.

*5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: 
[https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]*
 While debugging TestShuffleHandler#testMapFileAccess, just realized that I 
forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial 
way was to modify the pipeline when the channel is activated.

*6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + 
reproduce jenkins UT failure: 
[https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]*
 Here's where the fun begins. The problem with 
TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module:
{code:java}
// This will run only in NativeIO is enabled as SecureIOUtils need it
assumeTrue(NativeIO.isAvailable());
{code}
I tried to compile the Hadoop Native libraries on my Mac according to these 
resources:
 - Native libraries: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html]
 - Followed this guide: 
[https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee]

Unfortunately, I still had compilation errors so I eventually gave up and 
tweaked the test to be able to run it locally. This wasn't such a complex 
thing, I don't think it's worth to go into the details, had to comment out some 
test code that used the Native library and that was all.
 From the Jenkins results I had this:
{code:java}
[INFO] --- 

[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:18 PM:
---

Just uploaded a new patch: [^HADOOP-15327.005.patch]

I have been (almost) exclusively working on this since my last comment 
(https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367)
 and there are a couple of things to add again.
 The last commit that was discussed is this: 
[https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62]
 Let me explain what've changed commit by commit. I will skip a bunch of 
trivial ones like code cleanup, added comments and the like.
 *I will cover the test failures surfaced by Jenkins build / unit test results:*
 - Build #1: 
https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456
 - Build #2: 
https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928

*1. TestShuffleHandler: Introduced InputStreamReadResult that stores response 
as string + total bytes read: 
[https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]*
 Here, I added a new class called 'InputStreamReadResult' that stores the bytes 
read (byte[]) and the number of bytes read from a response InputStream.
 This improves the way testcases can assert on these data.

*2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: 
[https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]*
 It was a common pitfall while debugging that the tests had to modified to use 
a certain fixed port. Here, I added a constant to store the port number so when 
I had to debug I only needed to change it in one single place.

*3. Create class: TestExecution: Configure proxy, keep alive connection 
timeout: 
[https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]*
 In order to debug the HTTP responses, I found it convenient to add a helper 
class that is responsible for the following:
 - Configuring the HTTP connections, use a proxy when required
 - Increase the keepalive timeout when using DEBUG mode

TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit 
test setup method.
 There are 2 flags that control the behaviour of this object:
{code:java}
//Control test execution properties with these flags
private static final boolean DEBUG_MODE = true;
//If this is set to true and proxy server is not running, tests will fail!
private static final boolean USE_PROXY = false; 
{code}
The only difference on top of these is in the code of testcases: They create 
all HTTP connections with:
{code:java}
TEST_EXECUTION.openConnection(url)
{code}
*4. TestExecution: Configure port: 
[https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]*
 One addition to 3. is to include the port used by ShuffleHandler in the 
TestExecution object. When using DEBUG mode, the port is fixed to a value, 
otherwise it is set to 0, meaning that the port will be dynamically chosen.

*5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: 
[https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]*
 While debugging TestShuffleHandler#testMapFileAccess, just realized that I 
forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial 
way was to modify the pipeline when the channel is activated.

*6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + 
reproduce jenkins UT failure: 
[https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]*
 Here's where the fun begins. The problem with 
TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module:
{code:java}
// This will run only in NativeIO is enabled as SecureIOUtils need it
assumeTrue(NativeIO.isAvailable());
{code}
I tried to compile the Hadoop Native libraries on my Mac according to these 
resources:
 - Native libraries: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html]
 - Followed this guide: 
[https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee]

Unfortunately, I still had compilation errors so I eventually gave up and 
tweaked the test to be able to run it locally. This wasn't such a complex 
thing, I don't think it's worth to go into the details, had to comment out some 
test code that used the Native library and that was all.
 From the Jenkins results I had this:
{code:java}
[INFO] --- 

[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:17 PM:
---

Just uploaded a new patch: [^HADOOP-15327.005.patch]

I have been (almost) exclusively working on this since my last comment 
(https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367)
 and there are a couple of things to add again.
 The last commit that was discussed is this: 
[https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62]
 Let me explain what've changed commit by commit. I will skip a bunch of 
trivial ones like code cleanup, added comments and the like.
 *I will cover the test failures surfaced by Jenkins build / unit test results:*
 - Build #1: 
https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456
 - Build #2: 
https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928

*1. TestShuffleHandler: Introduced InputStreamReadResult that stores response 
as string + total bytes read: 
[https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]*
 Here, I added a new class called 'InputStreamReadResult' that stores the bytes 
read (byte[]) and the number of bytes read from a response InputStream.
 This improves the way testcases can assert on these data.

*2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: 
[https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]*
 It was a common pitfall while debugging that the tests had to modified to use 
a certain fixed port. Here, I added a constant to store the port number so when 
I had to debug I only needed to change it in one single place.

*3. Create class: TestExecution: Configure proxy, keep alive connection 
timeout: 
[https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]*
 In order to debug the HTTP responses, I found it convenient to add a helper 
class that is responsible for the following:
 - Configuring the HTTP connections, use a proxy when required
 - Increase the keepalive timeout when using DEBUG mode

TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit 
test setup method.
 There are 2 flags that control the behaviour of this object:
{code:java}
//Control test execution properties with these flags
private static final boolean DEBUG_MODE = true;
//If this is set to true and proxy server is not running, tests will fail!
private static final boolean USE_PROXY = false; 
{code}
The only difference on top of these is in the code of testcases: They create 
all HTTP connections with:
{code:java}
TEST_EXECUTION.openConnection(url)
{code}
*4. TestExecution: Configure port: 
[https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]*
 One addition to 3. is to include the port used by ShuffleHandler in the 
TestExecution object. When using DEBUG mode, the port is fixed to a value, 
otherwise it is set to 0, meaning that the port will be dynamically chosen.

*5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: 
[https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]*
 While debugging TestShuffleHandler#testMapFileAccess, just realized that I 
forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial 
way was to modify the pipeline when the channel is activated.

*6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + 
reproduce jenkins UT failure: 
[https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]*
 Here's where the fun begins. The problem with 
TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module:
{code:java}
// This will run only in NativeIO is enabled as SecureIOUtils need it
assumeTrue(NativeIO.isAvailable());
{code}
I tried to compile the Hadoop Native libraries on my Mac according to these 
resources:
 - Native libraries: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html]
 - Followed this guide: 
[https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee]

Unfortunately, I still had compilation errors so I eventually gave up and 
tweaked the test to be able to run it locally. This wasn't such a complex 
thing, I don't think it's worth to go into the details, had to comment out some 
test code that used the Native library and that was all.
 From the Jenkins results I had this:
{code:java}
[INFO] --- 

[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:16 PM:
---

Just uploaded a new patch: [^HADOOP-15327.005.patch]

I have been (almost) exclusively working on this since my last comment and 
there are a couple of things to add again.
 The last commit that was discussed is this: 
[https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62]
 Let me explain what've changed commit by commit. I will skip a bunch of 
trivial ones like code cleanup, added comments and the like.
 *I will cover the test failures surfaced by Jenkins build / unit test results:*
 - Build #1
 - Build #2

*1. TestShuffleHandler: Introduced InputStreamReadResult that stores response 
as string + total bytes read: 
[https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]*
 Here, I added a new class called 'InputStreamReadResult' that stores the bytes 
read (byte[]) and the number of bytes read from a response InputStream.
 This improves the way testcases can assert on these data.

*2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: 
[https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]*
 It was a common pitfall while debugging that the tests had to modified to use 
a certain fixed port. Here, I added a constant to store the port number so when 
I had to debug I only needed to change it in one single place.

*3. Create class: TestExecution: Configure proxy, keep alive connection 
timeout: 
[https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]*
 In order to debug the HTTP responses, I found it convenient to add a helper 
class that is responsible for the following:
 - Configuring the HTTP connections, use a proxy when required
 - Increase the keepalive timeout when using DEBUG mode

TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit 
test setup method.
 There are 2 flags that control the behaviour of this object:
{code:java}
//Control test execution properties with these flags
private static final boolean DEBUG_MODE = true;
//If this is set to true and proxy server is not running, tests will fail!
private static final boolean USE_PROXY = false; 
{code}
The only difference on top of these is in the code of testcases: They create 
all HTTP connections with:
{code:java}
TEST_EXECUTION.openConnection(url)
{code}
*4. TestExecution: Configure port: 
[https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]*
 One addition to 3. is to include the port used by ShuffleHandler in the 
TestExecution object. When using DEBUG mode, the port is fixed to a value, 
otherwise it is set to 0, meaning that the port will be dynamically chosen.

*5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: 
[https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]*
 While debugging TestShuffleHandler#testMapFileAccess, just realized that I 
forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial 
way was to modify the pipeline when the channel is activated.

*6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + 
reproduce jenkins UT failure: 
[https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]*
 Here's where the fun begins. The problem with 
TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module:
{code:java}
// This will run only in NativeIO is enabled as SecureIOUtils need it
assumeTrue(NativeIO.isAvailable());
{code}
I tried to compile the Hadoop Native libraries on my Mac according to these 
resources:
 - Native libraries: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html]
 - Followed this guide: 
[https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee]

Unfortunately, I still had compilation errors so I eventually gave up and 
tweaked the test to be able to run it locally. This wasn't such a complex 
thing, I don't think it's worth to go into the details, had to comment out some 
test code that used the Native library and that was all.
 From the Jenkins results I had this:
{code:java}
[INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ 
hadoop-mapreduce-client-shuffle ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.mapred.TestFadvisedFileRegion
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.493 s 
- in org.apache.hadoop.mapred.TestFadvisedFileRegion
[INFO] Running org.apache.hadoop.mapred.TestShuffleHandler

[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:15 PM:
---

Just uploaded a new patch: [^HADOOP-15327.005.patch]

I have been (almost) exclusively working on this since [my last 
comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367]
 and there are a couple of things to add again.
 The last commit that was discussed is this: 
[https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62]
 Let me explain what've changed commit by commit. I will skip a bunch of 
trivial ones like code cleanup, added comments and the like.
 *I will cover the test failures surfaced by Jenkins build / unit test results:*
 - [Build 
#1|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456]
 - [Build 
#2|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928]

*1. TestShuffleHandler: Introduced InputStreamReadResult that stores response 
as string + total bytes read: 
[https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]*
 Here, I added a new class called 'InputStreamReadResult' that stores the bytes 
read (byte[]) and the number of bytes read from a response InputStream.
 This improves the way testcases can assert on these data.

*2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: 
[https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]*
 It was a common pitfall while debugging that the tests had to modified to use 
a certain fixed port. Here, I added a constant to store the port number so when 
I had to debug I only needed to change it in one single place.

*3. Create class: TestExecution: Configure proxy, keep alive connection 
timeout: 
[https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]*
 In order to debug the HTTP responses, I found it convenient to add a helper 
class that is responsible for the following:
 - Configuring the HTTP connections, use a proxy when required
 - Increase the keepalive timeout when using DEBUG mode

TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit 
test setup method.
 There are 2 flags that control the behaviour of this object:
{code:java}
//Control test execution properties with these flags
private static final boolean DEBUG_MODE = true;
//If this is set to true and proxy server is not running, tests will fail!
private static final boolean USE_PROXY = false; 
{code}
The only difference on top of these is in the code of testcases: They create 
all HTTP connections with:
{code:java}
TEST_EXECUTION.openConnection(url)
{code}
*4. TestExecution: Configure port: 
[https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]*
 One addition to 3. is to include the port used by ShuffleHandler in the 
TestExecution object. When using DEBUG mode, the port is fixed to a value, 
otherwise it is set to 0, meaning that the port will be dynamically chosen.

*5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: 
[https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]*
 While debugging TestShuffleHandler#testMapFileAccess, just realized that I 
forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial 
way was to modify the pipeline when the channel is activated.

*6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + 
reproduce jenkins UT failure: 
[https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]*
 Here's where the fun begins. The problem with 
TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module:
{code:java}
// This will run only in NativeIO is enabled as SecureIOUtils need it
assumeTrue(NativeIO.isAvailable());
{code}
I tried to compile the Hadoop Native libraries on my Mac according to these 
resources:
 - Native libraries: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html]
 - Followed this guide: 
[https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee]

Unfortunately, I still had compilation errors so I eventually gave up and 
tweaked the test to be able to run it locally. This wasn't such a complex 
thing, I don't think it's worth to go into the details, had to comment out some 
test code that used the Native library and that was all.
 From the Jenkins results I had this:
{code:java}
[INFO] --- 

[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:14 PM:
---

Just uploaded a new patch: [^HADOOP-15327.005.patch]

I have been (almost) exclusively working on this since [my last 
comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367]
 and there are a couple of things to add again.
 The last commit that was discussed is this: 
[https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62]
 Let me explain what've changed commit by commit. I will skip a bunch of 
trivial ones like code cleanup, added comments and the like.
 *I will cover the test failures surfaced by Jenkins build / unit test results:*
 - [Build 
#1|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456]
 - [Build 
#2|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928]

*1. TestShuffleHandler: Introduced InputStreamReadResult that stores response 
as string + total bytes read: 
[https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]*
 Here, I added a new class called 'InputStreamReadResult' that stores the bytes 
read (byte[]) and the number of bytes read from a response InputStream.
 This improves the way testcases can assert on these data.

*2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: 
[https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]*
 It was a common pitfall while debugging that the tests had to modified to use 
a certain fixed port. Here, I added a constant to store the port number so when 
I had to debug I only needed to change it in one single place.

*3. Create class: TestExecution: Configure proxy, keep alive connection 
timeout: 
[https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]*
 In order to debug the HTTP responses, I found it convenient to add a helper 
class that is responsible for the following:
 - Configuring the HTTP connections, use a proxy when required
 - Increase the keepalive timeout when using DEBUG mode

TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit 
test setup method.
 There are 2 flags that control the behaviour of this object:
{code:java}
//Control test execution properties with these flags
private static final boolean DEBUG_MODE = true;
//If this is set to true and proxy server is not running, tests will fail!
private static final boolean USE_PROXY = false; 
{code}
The only difference on top of these is in the code of testcases: They create 
all HTTP connections with:
{code:java}
TEST_EXECUTION.openConnection(url)
{code}
*4. TestExecution: Configure port: 
[https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]*
 One addition to 3. is to include the port used by ShuffleHandler in the 
TestExecution object. When using DEBUG mode, the port is fixed to a value, 
otherwise it is set to 0, meaning that the port will be dynamically chosen.

*5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: 
[https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]*
 While debugging TestShuffleHandler#testMapFileAccess, just realized that I 
forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial 
way was to modify the pipeline when the channel is activated.

*6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + 
reproduce jenkins UT failure: 
[https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]*
 Here's where the fun begins. The problem with 
TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module:
{code:java}
// This will run only in NativeIO is enabled as SecureIOUtils need it
assumeTrue(NativeIO.isAvailable());
{code}
I tried to compile the Hadoop Native libraries on my Mac according to these 
resources:
 - Native libraries: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html]
 - Followed this guide: 
[https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee]

Unfortunately, I still had compilation errors so I eventually gave up and 
tweaked the test to be able to run it locally. This wasn't such a complex 
thing, I don't think it's worth to go into the details, had to comment out some 
test code that used the Native library and that was all.
 From the Jenkins results I had this:
{code:java}
[INFO] --- 

[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:13 PM:
---

Just uploaded a new patch: [^HADOOP-15327.005.patch]

I have been (almost) exclusively working on this since [my last 
comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362367=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362367]
 and there are a couple of things to add again.
 The last commit that was discussed is this: 
[https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62]
 Let me explain what've changed commit by commit. I will skip a bunch of 
trivial ones like code cleanup, added comments and the like.
 *I will cover the test failures surfaced by Jenkins build / unit test results:*
 - [Build 
#1|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456]
 - [Build 
#2|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928]

*1. TestShuffleHandler: Introduced InputStreamReadResult that stores response 
as string + total bytes read: 
[https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]*
 Here, I added a new class called 'InputStreamReadResult' that stores the bytes 
read (byte[]) and the number of bytes read from a response InputStream.
 This improves the way testcases can assert on these data.

*2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: 
[https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]*
 It was a common pitfall while debugging that the tests had to modified to use 
a certain fixed port. Here, I added a constant to store the port number so when 
I had to debug I only needed to change it in one single place.

*3. Create class: TestExecution: Configure proxy, keep alive connection 
timeout: 
[https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]*
 In order to debug the HTTP responses, I found it convenient to add a helper 
class that is responsible for the following:
 - Configuring the HTTP connections, use a proxy when required
 - Increase the keepalive timeout when using DEBUG mode

TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit 
test setup method.
 There are 2 flags that control the behaviour of this object:
{code:java}
//Control test execution properties with these flags
private static final boolean DEBUG_MODE = true;
//If this is set to true and proxy server is not running, tests will fail!
private static final boolean USE_PROXY = false; 
{code}
The only difference on top of these is in the code of testcases: They create 
all HTTP connections with:
{code:java}
TEST_EXECUTION.openConnection(url)
{code}
*4. TestExecution: Configure port: 
[https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]*
 One addition to 3. is to include the port used by ShuffleHandler in the 
TestExecution object. When using DEBUG mode, the port is fixed to a value, 
otherwise it is set to 0, meaning that the port will be dynamically chosen.

*5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: 
[https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]*
 While debugging TestShuffleHandler#testMapFileAccess, just realized that I 
forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial 
way was to modify the pipeline when the channel is activated.

*6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + 
reproduce jenkins UT failure: 
[https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]*
 Here's where the fun begins. The problem with 
TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module:
{code:java}
// This will run only in NativeIO is enabled as SecureIOUtils need it
assumeTrue(NativeIO.isAvailable());
{code}
I tried to compile the Hadoop Native libraries on my Mac according to these 
resources:
 - Native libraries: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html]
 - Followed this guide: 
[https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee]

Unfortunately, I still had compilation errors so I eventually gave up and 
tweaked the test to be able to run it locally. This wasn't such a complex 
thing, I don't think it's worth to go into the details, had to comment out some 
test code that used the Native library and that was all.
 From the Jenkins results I had this:
{code:java}
[INFO] --- 

[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:09 PM:
---

Just uploaded a new patch: [^HADOOP-15327.005.patch]

I have been (almost) exclusively working on this since my last comment and 
there are a couple of things to add again.
 The last commit that was discussed is this: 
[https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62]
 Let me explain what've changed commit by commit. I will skip a bunch of 
trivial ones like code cleanup, added comments and the like.
 *I will cover the test failures surfaced by Jenkins build / unit test results:*
 - [Build 
#1|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17362456=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17362456]
 - [Build 
#2|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17363928=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17363928]

*1. TestShuffleHandler: Introduced InputStreamReadResult that stores response 
as string + total bytes read: 
[https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]*
 Here, I added a new class called 'InputStreamReadResult' that stores the bytes 
read (byte[]) and the number of bytes read from a response InputStream.
 This improves the way testcases can assert on these data.

*2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: 
[https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]*
 It was a common pitfall while debugging that the tests had to modified to use 
a certain fixed port. Here, I added a constant to store the port number so when 
I had to debug I only needed to change it in one single place.

*3. Create class: TestExecution: Configure proxy, keep alive connection 
timeout: 
[https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]*
 In order to debug the HTTP responses, I found it convenient to add a helper 
class that is responsible for the following:
 - Configuring the HTTP connections, use a proxy when required
 - Increase the keepalive timeout when using DEBUG mode

TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit 
test setup method.
 There are 2 flags that control the behaviour of this object:
{code:java}
//Control test execution properties with these flags
private static final boolean DEBUG_MODE = true;
//If this is set to true and proxy server is not running, tests will fail!
private static final boolean USE_PROXY = false; 
{code}
The only difference on top of these is in the code of testcases: They create 
all HTTP connections with:
{code:java}
TEST_EXECUTION.openConnection(url)
{code}
*4. TestExecution: Configure port: 
[https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]*
 One addition to 3. is to include the port used by ShuffleHandler in the 
TestExecution object. When using DEBUG mode, the port is fixed to a value, 
otherwise it is set to 0, meaning that the port will be dynamically chosen.

*5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: 
[https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]*
 While debugging TestShuffleHandler#testMapFileAccess, just realized that I 
forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial 
way was to modify the pipeline when the channel is activated.

*6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + 
reproduce jenkins UT failure: 
[https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]*
 Here's where the fun begins. The problem with 
TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module:
{code:java}
// This will run only in NativeIO is enabled as SecureIOUtils need it
assumeTrue(NativeIO.isAvailable());
{code}
I tried to compile the Hadoop Native libraries on my Mac according to these 
resources:
 - Native libraries: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html]
 - Followed this guide: 
[https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee]

Unfortunately, I still had compilation errors so I eventually gave up and 
tweaked the test to be able to run it locally. This wasn't such a complex 
thing, I don't think it's worth to go into the details, had to comment out some 
test code that used the Native library and that was all.
 From the Jenkins results I had this:
{code:java}
[INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ 
hadoop-mapreduce-client-shuffle ---
[INFO] 
[INFO] ---
[INFO]  T E 

[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/29/21, 3:08 PM:
---

Just uploaded a new patch: [^HADOOP-15327.005.patch]

I have been (almost) exclusively working on this since my last comment and 
there are a couple of things to add again.
 The last commit that was discussed is this: 
[https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62]
 Let me explain what've changed commit by commit. I will skip a bunch of 
trivial ones like code cleanup, added comments and the like.
 *I will cover the test failures surfaced by Jenkins build / unit test results:*
 - Build #1
 - Build #2

*1. TestShuffleHandler: Introduced InputStreamReadResult that stores response 
as string + total bytes read: 
[https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]*
 Here, I added a new class called 'InputStreamReadResult' that stores the bytes 
read (byte[]) and the number of bytes read from a response InputStream.
 This improves the way testcases can assert on these data.

*2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: 
[https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]*
 It was a common pitfall while debugging that the tests had to modified to use 
a certain fixed port. Here, I added a constant to store the port number so when 
I had to debug I only needed to change it in one single place.

*3. Create class: TestExecution: Configure proxy, keep alive connection 
timeout: 
[https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]*
 In order to debug the HTTP responses, I found it convenient to add a helper 
class that is responsible for the following:
 - Configuring the HTTP connections, use a proxy when required
 - Increase the keepalive timeout when using DEBUG mode

TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit 
test setup method.
 There are 2 flags that control the behaviour of this object:
{code:java}
//Control test execution properties with these flags
private static final boolean DEBUG_MODE = true;
//If this is set to true and proxy server is not running, tests will fail!
private static final boolean USE_PROXY = false; 
{code}
The only difference on top of these is in the code of testcases: They create 
all HTTP connections with:
{code:java}
TEST_EXECUTION.openConnection(url)
{code}
*4. TestExecution: Configure port: 
[https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]*
 One addition to 3. is to include the port used by ShuffleHandler in the 
TestExecution object. When using DEBUG mode, the port is fixed to a value, 
otherwise it is set to 0, meaning that the port will be dynamically chosen.

*5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: 
[https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]*
 While debugging TestShuffleHandler#testMapFileAccess, just realized that I 
forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial 
way was to modify the pipeline when the channel is activated.

*6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + 
reproduce jenkins UT failure: 
[https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]*
 Here's where the fun begins. The problem with 
TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module:
{code:java}
// This will run only in NativeIO is enabled as SecureIOUtils need it
assumeTrue(NativeIO.isAvailable());
{code}
I tried to compile the Hadoop Native libraries on my Mac according to these 
resources:
 - Native libraries: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html]
 - Followed this guide: 
[https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee]

Unfortunately, I still had compilation errors so I eventually gave up and 
tweaked the test to be able to run it locally. This wasn't such a complex 
thing, I don't think it's worth to go into the details, had to comment out some 
test code that used the Native library and that was all.
 From the Jenkins results I had this:
{code:java}
[INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ 
hadoop-mapreduce-client-shuffle ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.mapred.TestFadvisedFileRegion
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.493 s 
- in org.apache.hadoop.mapred.TestFadvisedFileRegion
[INFO] Running org.apache.hadoop.mapred.TestShuffleHandler

[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371468#comment-17371468
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

Just uploaded a new patch: [^HADOOP-15327.005.patch]

I have been (almost) exclusively working on this since my last comment and 
there are a couple of things to add again.
 The last commit that was discussed is this: 
[https://github.com/szilard-nemeth/hadoop/commit/f149be8de28baafc64eed1c47e788f5beb215e62]
 Let me explain what've changed commit by commit. I will skip a bunch of 
trivial ones like code cleanup, added comments and the like.
 *I will cover the test failures surfaced by Jenkins build / unit test results:*
 - Build #1
 - Build #2

*1. TestShuffleHandler: Introduced InputStreamReadResult that stores response 
as string + total bytes read: 
[https://github.com/szilard-nemeth/hadoop/commit/a57de573c97fe12c9071dd3450df8f450bf075ea]*
 Here, I added a new class called 'InputStreamReadResult' that stores the bytes 
read (byte[]) and the number of bytes read from a response InputStream.
 This improves the way testcases can assert on these data.

*2. TestShuffleHandler: Use DEFAULT_PORT for all shuffle handler port configs: 
[https://github.com/szilard-nemeth/hadoop/commit/78b1166866c85cab6860407f8fe4a4ddc3168fae]*
 It was a common pitfall while debugging that the tests had to modified to use 
a certain fixed port. Here, I added a constant to store the port number so when 
I had to debug I only needed to change it in one single place.

*3. Create class: TestExecution: Configure proxy, keep alive connection 
timeout: 
[https://github.com/szilard-nemeth/hadoop/commit/fa5bb32ae4eb737077a165b3b1fba5069c982243]*
 In order to debug the HTTP responses, I found it convenient to add a helper 
class that is responsible for the following:
 - Configuring the HTTP connections, use a proxy when required
 - Increase the keepalive timeout when using DEBUG mode

TEST_EXECUTION is a static instance of TestExecution, initialized with a JUnit 
test setup method.
 There are 2 flags that control the behaviour of this object:
{code:java}
//Control test execution properties with these flags
private static final boolean DEBUG_MODE = true;
//If this is set to true and proxy server is not running, tests will fail!
private static final boolean USE_PROXY = false; 
{code}
The only difference on top of these is in the code of testcases: They create 
all HTTP connections with:
{code:java}
TEST_EXECUTION.openConnection(url)
{code}
*4. TestExecution: Configure port: 
[https://github.com/szilard-nemeth/hadoop/commit/4a5c035695be1099bff4a633cd605b9f8146d841]*
 One addition to 3. is to include the port used by ShuffleHandler in the 
TestExecution object. When using DEBUG mode, the port is fixed to a value, 
otherwise it is set to 0, meaning that the port will be dynamically chosen.

*5. Add logging response encoder to TestShuffleHandler.testMapFileAccess: 
[https://github.com/szilard-nemeth/hadoop/commit/64686b47d2fed4e923c1c9c0169a06aba3e339be]*
 While debugging TestShuffleHandler#testMapFileAccess, just realized that I 
forgot to add the LoggingHttpResponseEncoder to the pipeline. The most trivial 
way was to modify the pipeline when the channel is activated.

*6. TestShuffleHandler.testMapFileAccess: Modify to be able to run it locally + 
reproduce jenkins UT failure: 
[https://github.com/szilard-nemeth/hadoop/commit/bb0fcbbd7dcbe3fa7efd1b6a8c2eb8a9055c5ecd]*
 Here's where the fun begins. The problem with 
TestShuffleHandler#testMapFileAccess is that it requires the NativeIO module:
{code:java}
// This will run only in NativeIO is enabled as SecureIOUtils need it
assumeTrue(NativeIO.isAvailable());
{code}
I tried to compile the Hadoop Native libraries on my Mac according to these 
resources:
 - Native libraries: 
[https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html]
 - Followed this guide: 
[https://dev.to/zejnilovic/building-hadoop-native-libraries-on-mac-in-2019-1iee]

Unfortunately, I still had compilation errors so I eventually gave up and 
tweaked the test to be able to run it locally. This wasn't such a complex 
thing, I don't think it's worth to go into the details, had to comment out some 
test code that used the Native library and that was all.
 From the Jenkins results I had this:
{code:java}
[INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ 
hadoop-mapreduce-client-shuffle ---
[INFO] 
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running org.apache.hadoop.mapred.TestFadvisedFileRegion
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.493 s 
- in org.apache.hadoop.mapred.TestFadvisedFileRegion
[INFO] Running org.apache.hadoop.mapred.TestShuffleHandler
[ERROR] Tests run: 15, Failures: 1, Errors: 0, 

[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: testfailure-testReduceFromPartialMem.zip
testfailure-testMapFileAccess-emptyresponse.zip

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log, 
> testfailure-testMapFileAccess-emptyresponse.zip, 
> testfailure-testReduceFromPartialMem.zip
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-29 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: HADOOP-15327.005.patch

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, HADOOP-15327.005.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-24 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367912#comment-17367912
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/24/21, 7:52 PM:
---

Hey [~weichiu],
Thanks for putting the excerpt here. This could be fixed in parallel, I would 
be glad if you could point me to the config that needs to be changed.
Currently, I'm working on the test issues produced by the build that ran 
against patch003: 
hadoop.mapred.TestReduceFetchFromPartialMem
hadoop.mapred.TestReduceFetch
There are jiras related to these tests but checked the logs and saw very 
suspicious things and it pointed me to a code defect.
I will upload a next patch soon along with explanation of what has been changed 
since patch004.
Hopefully, this can be the last one and I can finally start testing on a 
cluster. 
Will also make sure of creating proper manual testing documentation + 
collecting the test evidence.
I wouldn't expect any production issues (fingers crossed) as test coverage is 
quite good and while I have been fixing the tests, I gained a lot of code 
knowledge, almost being familiar with the ShuffleHandler inside and out.


was (Author: snemeth):
Hey [~weichiu],
Thanks for putting the exceprt here. This could be fixed in parallel, I would 
be glad if you could point me to the config that needs to be changed.
Currently, I'm working on the test issues produced by the build that ran 
against patch003: 
hadoop.mapred.TestReduceFetchFromPartialMem
hadoop.mapred.TestReduceFetch
There are jiras related to these tests but checked the logs and saw very 
suspicious things and it pointed me to a code defect.
I will upload a next patch soon along with explanation of what has been changed 
since patch004.
Hopefully, this can be the last one and I can finally start testing on a 
cluster. 
Will also make sure of creating proper manual testing documentation + 
collecting the test evidence.
I wouldn't expect any production issues (fingers crossed) as test coverage is 
quite good and while I have been fixing the tests, I gained a lot of code 
knowledge, almost being familiar with the ShuffleHandler inside and out.

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-23 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367912#comment-17367912
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/23/21, 7:00 AM:
---

Hey [~weichiu],
Thanks for putting the exceprt here. This could be fixed in parallel, I would 
be glad if you could point me to the config that needs to be changed.
Currently, I'm working on the test issues produced by the build that ran 
against patch003: 
hadoop.mapred.TestReduceFetchFromPartialMem
hadoop.mapred.TestReduceFetch
There are jiras related to these tests but checked the logs and saw very 
suspicious things and it pointed me to a code defect.
I will upload a next patch soon along with explanation of what has been changed 
since patch004.
Hopefully, this can be the last one and I can finally start testing on a 
cluster. 
Will also make sure of creating proper manual testing documentation + 
collecting the test evidence.
I wouldn't expect any production issues (fingers crossed) as test coverage is 
quite good and while I have been fixing the tests, I gained a lot of code 
knowledge, almost being familiar with the ShuffleHandler inside and out.


was (Author: snemeth):
Hey [~weichiu],
Thanks for putting the exceprt here. This could be fixed in parallel, I would 
be glad if you could point me to the config that needs to be changed.
Currently, I'm working on the test issues produced by the build that ran 
against patch003: 
hadoop.mapred.TestReduceFetchFromPartialMem
hadoop.mapred.TestReduceFetch
There are jiras related to these tests but checked the logs and saw very 
suspicious things and it pointed me to a code defect.
I will upload a next patch soon along with explanation of what has been changed 
since patch004.
Hopefully, this can be the last one and I can finally start testing on a 
cluster. I wouldn't expect any production issues (fingers crossed) as test 
coverage is quite good and while I have been fixing the tests, I gained a lot 
of code knowledge, almost being familiar with the ShuffleHandler inside and out.

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-23 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367912#comment-17367912
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

Hey [~weichiu],
Thanks for putting the exceprt here. This could be fixed in parallel, I would 
be glad if you could point me to the config that needs to be changed.
Currently, I'm working on the test issues produced by the build that ran 
against patch003: 
hadoop.mapred.TestReduceFetchFromPartialMem
hadoop.mapred.TestReduceFetch
There are jiras related to these tests but checked the logs and saw very 
suspicious things and it pointed me to a code defect.
I will upload a next patch soon along with explanation of what has been changed 
since patch004.
Hopefully, this can be the last one and I can finally start testing on a 
cluster. I wouldn't expect any production issues (fingers crossed) as test 
coverage is quite good and while I have been fixing the tests, I gained a lot 
of code knowledge, almost being familiar with the ShuffleHandler inside and out.

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-15 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: HADOOP-15327.004.patch

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, HADOOP-15327.004.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-12 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362368#comment-17362368
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

*Remaining TODO items that I can make progress with:*
 - Fix failing unit tests
 - Testing on cluster

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-12 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: HADOOP-15327.003.patch

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> HADOOP-15327.003.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-12 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362367#comment-17362367
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

The latest patch contains commits from this branch: 
[https://github.com/szilard-nemeth/hadoop/commits/HADOOP-15327-snemeth]
 There are a couple of commits so I would approach this by explaning the 
reasons behind each change in the commits.
 Not all commits are listed, I left out a few trivial ones.
 Unfortunately, this task was a bit tricky as everytime I touched something in 
the test, I just found another bug or weird behaviour so it took a great deal 
of time to solve and discover everything.

*1. ShuffleHandler: ch.isOpen() --> ch.isActive(): 
[https://github.com/szilard-nemeth/hadoop/commit/e703adb57f66da8579baa26257ca9aaed2bf1db5]*
 This was already mentioned with my previous lenghtier comment.

*2. TestShuffleHandler: Fix mocking in testSendMapCount + replace ch.write() 
with ch.writeAndFlush(): 
[https://github.com/szilard-nemeth/hadoop/commit/07fbfee5cae85e8e374b53c303e794c19c620efc]*
 This is about 2 things:
 - Replacing channel.write calls with channel.writeAndFlush
 - Fixing bad mocking in 
org.apache.hadoop.mapred.TestShuffleHandler#testSendMapCount

*3. TestShuffleHandler.testMaxConnections: Rewrite test + production code: 
accepted connection handling: 
[https://github.com/szilard-nemeth/hadoop/commit/def0059982ef8f0e2f19d385b1a1fcdca8639f9d]*
 *Changes in production code:*
 - ShuffleHandler#channelActive added the channel to the channel group (field 
called 'accepted') before the if statement that enforces the maximum number of 
open connections. This was the old, wrong piece of code:
{code:java}
 super.channelActive(ctx);
  LOG.debug("accepted connections={}", accepted.size());

  if ((maxShuffleConnections > 0) && (accepted.size() >= 
maxShuffleConnections)) {
{code}

 - Also, counting the number of open channels with the channel group was 
unreliable so I introduced a new AtomicInteger field called 
'acceptedConnections' to track the open channels / connections.
 - There was another issue: When the channels were accepted, the counter of 
open channels was increased but when channels were inactivated I could not see 
any code that would have maintained (decremented) the value.
 This was mitigated by adding 
org.apache.hadoop.mapred.ShuffleHandler.Shuffle#channelInactive that logs the 
channel inactivated event and decreases the open connections counter:
{code:java}
@Override
public void channelInactive(ChannelHandlerContext ctx) throws Exception {
  super.channelInactive(ctx);
  acceptedConnections.decrementAndGet();
  LOG.debug("New value of Accepted number of connections={}",
  acceptedConnections.get());
}
{code}

*Changes in test code:*
 - org.apache.hadoop.mapred.TestShuffleHandler#testMaxConnections: Fixed the 
testcase, the issue was pointed out correctly by [~weichiu] : The connections 
are accepted in parallel so we should not rely on their order in the test. The 
way I rewritten this is that I introduced a map to group HttpURLConnection 
objects by their HTTP response code.
 Then I check if we only have 200 OK and 429 TOO MANY REQUESTS, and check if 
the number of 200 OK connections is 2 and there's only one unaccepted 
connection.

*4. increase netty version to 4.1.65.Final: 
[https://github.com/szilard-nemeth/hadoop/commit/4f4589063b579a93389b1e188c29bd895ae507fc]*
 This is a simple commit to increase the Netty version to the latest stable 4.x 
version.
 See this page: [https://netty.io/downloads.html]
 It states: "netty-4.1.65.Final.tar.gz ‐ 19-May-2021 (Stable, Recommended)"

*5. ShuffleHandler: Fix keepalive test + writing HTTP response properly to 
channel: 
[https://github.com/szilard-nemeth/hadoop/commit/1aad4eaace28cfff4a9a9152f7535d70cc6e3734]*
 This is where things get more interesting. There was a testcase called 
org.apache.hadoop.mapred.TestShuffleHandler#testKeepAlive that caught an issue 
that came up because Netty 4.x handles HTTP responses written to the same 
channel differently than Netty 3.x.
 See details below.

Production code changes:
 - Added some logs to be able to track what happened when utilizing HTTP 
Connection Keep-alive.
 - Added a ChannelOutboundHandlerAdapter that handles exceptions that happens 
during outbound message construction. This is by default not logged by Netty 
and I only found this trick to catch these events:
{code:java}
  pipeline.addLast("outboundExcHandler", new 
ChannelOutboundHandlerAdapter() {
@Override
public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise 
promise) throws Exception {
  promise.addListener(ChannelFutureListener.FIRE_EXCEPTION_ON_FAILURE);
  super.write(ctx, msg, promise);
}
  });
{code}
This solution is described here: 

[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-12 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: 
getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch, 
> getMapOutputInfo_BlockingOperationException_awaitUninterruptibly.log
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-12 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: HADOOP-15327.002.patch

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-12 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: HADOOP-15327-snemeth.002.patch

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-12 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: (was: HADOOP-15327-snemeth.002.patch)

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch, HADOOP-15327.002.patch
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362029#comment-17362029
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/11/21, 8:35 PM:
---

Thanks [~weichiu] for your help.
 Added a preliminary patch to kick-off Jenkins.
 Haven't touch Maven shading config so I'm expecting a Maven error from Jenkins 
as I had it locally.
 Referring back to [your 
comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17356433=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17356433]:
 I'm quite a beginner with shading, A.K.A. I have no idea what to touch to fix 
the current shading issues. Can you or anyone else help me out with this?

*Remaining TODO items that I can make progress with:*
 - Testing on cluster
 - Adding explanation comment for the new code changes: A more lengthy comment 
will follow :)


was (Author: snemeth):
Thanks [~weichiu] for your help.
 Added a preliminary patch to kick-off Jenkins.
 Haven't touch Maven shading config so I'm expecting a Maven error from Jenkins 
as I had it locally.
 Referring back to [your 
comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17356433=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17356433]:
 I'm quite a beginner with shading. Can you or anyone else help me out with 
this?

*Remaining TODO items:*
 - Testing on cluster
 - Adding explanation comment for the new code changes.
 So a more lengthy comment will follow this :)

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362029#comment-17362029
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/11/21, 8:33 PM:
---

Thanks [~weichiu] for your help.
 Added a preliminary patch to kick-off Jenkins.
 Haven't touch Maven shading config so I'm expecting a Maven error from Jenkins 
as I had it locally.
 Referring back to your comment here: I'm quite a beginner with shading. Can 
you or anyone else help me out with this?

*Remaining TODO items:*
 - Testing on cluster
 - Adding explanation comment for the new code changes.
 So a more lengthy comment will follow this :)


was (Author: snemeth):
Thanks [~weichiu] for your help.
 Added a preliminary patch to kick-off Jenkins.
 Haven't touch shaded config so I'm expecting a Maven error from Jenkins.
 Referring back to your comment here: I'm quite a beginner with shading. Can 
you or anyone else help me out with this?

*Remaining TODO items:*
 - Testing on cluster
 - Adding explanation comment for the new code changes.
 So a more lengthy comment will follow this :)

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362029#comment-17362029
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/11/21, 8:33 PM:
---

Thanks [~weichiu] for your help.
 Added a preliminary patch to kick-off Jenkins.
 Haven't touch Maven shading config so I'm expecting a Maven error from Jenkins 
as I had it locally.
 Referring back to [your 
comment|https://issues.apache.org/jira/browse/HADOOP-15327?focusedCommentId=17356433=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17356433]:
 I'm quite a beginner with shading. Can you or anyone else help me out with 
this?

*Remaining TODO items:*
 - Testing on cluster
 - Adding explanation comment for the new code changes.
 So a more lengthy comment will follow this :)


was (Author: snemeth):
Thanks [~weichiu] for your help.
 Added a preliminary patch to kick-off Jenkins.
 Haven't touch Maven shading config so I'm expecting a Maven error from Jenkins 
as I had it locally.
 Referring back to your comment here: I'm quite a beginner with shading. Can 
you or anyone else help me out with this?

*Remaining TODO items:*
 - Testing on cluster
 - Adding explanation comment for the new code changes.
 So a more lengthy comment will follow this :)

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362029#comment-17362029
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

Thanks [~weichiu] for your help.
 Added a preliminary patch to kick-off Jenkins.
 Haven't touch shaded config so I'm expecting a Maven error from Jenkins.
 Referring back to your comment here: I'm quite a beginner with shading. Can 
you or anyone else help me out with this?

*Remaining TODO items:*
 - Testing on cluster
 - Adding explanation comment for the new code changes.
 So a more lengthy comment will follow this :)

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-11 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Status: Patch Available  (was: In Progress)

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-11 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15327:

Attachment: HADOOP-15327.001.patch

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15327.001.patch
>
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358861#comment-17358861
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

Let me list the differences introduced because of the migration from Netty 3.x 
to 4.x.
 There is a migration guide that mentions most (but not all) of the changes: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html]
 Please note that the below code changes are based on Wei-Chiu's branch: 
[https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4]
h2. CHANGES IN ShuffleHandler
h3. *I will list the changes mostly from ShuffleHandler as it covers almost all 
type of changes in other classes as well.*

*In TestShuffleHandler, the test code was changed by any of the justifications 
listed down below.*
h3. Change category #1: General API changes / non-configuration getters:

Details: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes]
{quote}Non-configuration getters have no get- prefix anymore. (e.g. 
Channel.getRemoteAddress() → Channel.remoteAddress())
 Boolean properties are still prefixed with is- to avoid confusion (e.g. 
'empty' is both an adjective and a verb, so empty() can have two meanings.)
{quote}
I'm just listing all the changes without additional context (in which method 
they were changed) separated by three dots, as they are simply method renamings:
{code:java}
-future.getChannel().close();
+future.channel().closeFuture().awaitUninterruptibly();
...
...
-  ChannelPipeline pipeline = future.getChannel().getPipeline();
+  ChannelPipeline pipeline = future.channel().pipeline();
...
...
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
...
...
-  if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) {
-e.getChannel().close();
+  if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) {
+ctx.channel().close();
...
...
-  accepted.add(evt.getChannel());
+  accepted.add(ctx.channel());
...
...
-new QueryStringDecoder(request.getUri()).getParameters();
+new QueryStringDecoder(request.getUri()).parameters(); //getUri was 
not changed, see this later
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  reduceContext.getCtx().getChannel(),
+  reduceContext.getCtx().channel(),
...
...
-  if (ch.getPipeline().get(SslHandler.class) == null) {
+  if (ch.pipeline().get(SslHandler.class) == null) {
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  
ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE);
+  ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE);
...
...
-  Channel ch = e.getChannel();
-  Throwable cause = e.getCause();
+  Channel ch = ctx.channel();
{code}
h3. Change category #2: General API changes / Method signature changes.

*2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.*
 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound]
{quote}The terms 'upstream' and 'downstream' were pretty confusing to 
beginners. 4.0 uses 'inbound' and 'outbound' wherever possible.
{quote}
{code:java}
-  class Shuffle extends SimpleChannelUpstreamHandler {
+  @ChannelHandler.Sharable
+  class Shuffle extends ChannelInboundHandlerAdapter {
{code}
*2.2: Simplifed channel state model: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]*
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
*2.2.1 Changes in class: Shuffle*
{code:java}
 @Override
-public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) 
+public void channelActive(ChannelHandlerContext ctx)
 throws Exception {
-  super.channelOpen(ctx, evt);
+  super.channelActive(ctx);
{code}
*2.2.2 Changes in 
org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* 
 Quoting the change again:
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
{code:java}
   LOG.error("Shuffle error: ", cause);
-  if (ch.isConnected()) {
-LOG.error("Shuffle error " + e);
+   

[jira] [Issue Comment Deleted] (HADOOP-11219) [Umbrella] Upgrade to netty 4

2021-06-07 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-11219:

Comment: was deleted

(was: Let me list the differences introduced because of the migration from 
Netty 3.x to 4.x.
 There is a migration guide that mentions most (but not all) of the changes: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html]
 Please note that the below code changes are based on Wei-Chiu's branch: 
[https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4]
h2. CHANGES IN ShuffleHandler
h3. *I will list the changes mostly from ShuffleHandler as it covers almost all 
type of changes in other classes as well.*

*In TestShuffleHandler, the test code was changed by any of the justifications 
listed down below.*
h3. Change category #1: General API changes / non-configuration getters:

Details: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes]
{quote}Non-configuration getters have no get- prefix anymore. (e.g. 
Channel.getRemoteAddress() → Channel.remoteAddress())
 Boolean properties are still prefixed with is- to avoid confusion (e.g. 
'empty' is both an adjective and a verb, so empty() can have two meanings.)
{quote}
I'm just listing all the changes without additional context (in which method 
they were changed) separated by three dots, as they are simply method renamings:
{code:java}
-future.getChannel().close();
+future.channel().closeFuture().awaitUninterruptibly();
...
...
-  ChannelPipeline pipeline = future.getChannel().getPipeline();
+  ChannelPipeline pipeline = future.channel().pipeline();
...
...
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
...
...
-  if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) {
-e.getChannel().close();
+  if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) {
+ctx.channel().close();
...
...
-  accepted.add(evt.getChannel());
+  accepted.add(ctx.channel());
...
...
-new QueryStringDecoder(request.getUri()).getParameters();
+new QueryStringDecoder(request.getUri()).parameters(); //getUri was 
not changed, see this later
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  reduceContext.getCtx().getChannel(),
+  reduceContext.getCtx().channel(),
...
...
-  if (ch.getPipeline().get(SslHandler.class) == null) {
+  if (ch.pipeline().get(SslHandler.class) == null) {
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  
ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE);
+  ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE);
...
...
-  Channel ch = e.getChannel();
-  Throwable cause = e.getCause();
+  Channel ch = ctx.channel();
{code}
h3. Change category #2: General API changes / Method signature changes.

*2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.*
 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound]
{quote}The terms 'upstream' and 'downstream' were pretty confusing to 
beginners. 4.0 uses 'inbound' and 'outbound' wherever possible.
{quote}
{code:java}
-  class Shuffle extends SimpleChannelUpstreamHandler {
+  @ChannelHandler.Sharable
+  class Shuffle extends ChannelInboundHandlerAdapter {
{code}
*2.2: Simplifed channel state model: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]*
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
*2.2.1 Changes in class: Shuffle*
{code:java}
 @Override
-public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) 
+public void channelActive(ChannelHandlerContext ctx)
 throws Exception {
-  super.channelOpen(ctx, evt);
+  super.channelActive(ctx);
{code}
*2.2.2 Changes in 
org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* 
 Quoting the change again:
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
{code:java}
   LOG.error("Shuffle error: ", cause);
-  if (ch.isConnected()) {
-LOG.error("Shuffle error " + e);
+  if 

[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4

2021-06-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850
 ] 

Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:40 PM:
--

Let me list the differences introduced because of the migration from Netty 3.x 
to 4.x.
 There is a migration guide that mentions most (but not all) of the changes: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html]
 Please note that the below code changes are based on Wei-Chiu's branch: 
[https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4]
h2. CHANGES IN ShuffleHandler
h3. *I will list the changes mostly from ShuffleHandler as it covers almost all 
type of changes in other classes as well.*

*In TestShuffleHandler, the test code was changed by any of the justifications 
listed down below.*
h3. Change category #1: General API changes / non-configuration getters:

Details: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes]
{quote}Non-configuration getters have no get- prefix anymore. (e.g. 
Channel.getRemoteAddress() → Channel.remoteAddress())
 Boolean properties are still prefixed with is- to avoid confusion (e.g. 
'empty' is both an adjective and a verb, so empty() can have two meanings.)
{quote}
I'm just listing all the changes without additional context (in which method 
they were changed) separated by three dots, as they are simply method renamings:
{code:java}
-future.getChannel().close();
+future.channel().closeFuture().awaitUninterruptibly();
...
...
-  ChannelPipeline pipeline = future.getChannel().getPipeline();
+  ChannelPipeline pipeline = future.channel().pipeline();
...
...
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
...
...
-  if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) {
-e.getChannel().close();
+  if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) {
+ctx.channel().close();
...
...
-  accepted.add(evt.getChannel());
+  accepted.add(ctx.channel());
...
...
-new QueryStringDecoder(request.getUri()).getParameters();
+new QueryStringDecoder(request.getUri()).parameters(); //getUri was 
not changed, see this later
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  reduceContext.getCtx().getChannel(),
+  reduceContext.getCtx().channel(),
...
...
-  if (ch.getPipeline().get(SslHandler.class) == null) {
+  if (ch.pipeline().get(SslHandler.class) == null) {
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  
ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE);
+  ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE);
...
...
-  Channel ch = e.getChannel();
-  Throwable cause = e.getCause();
+  Channel ch = ctx.channel();
{code}
h3. Change category #2: General API changes / Method signature changes.

*2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.*
 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound]
{quote}The terms 'upstream' and 'downstream' were pretty confusing to 
beginners. 4.0 uses 'inbound' and 'outbound' wherever possible.
{quote}
{code:java}
-  class Shuffle extends SimpleChannelUpstreamHandler {
+  @ChannelHandler.Sharable
+  class Shuffle extends ChannelInboundHandlerAdapter {
{code}
*2.2: Simplifed channel state model: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]*
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
*2.2.1 Changes in class: Shuffle*
{code:java}
 @Override
-public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) 
+public void channelActive(ChannelHandlerContext ctx)
 throws Exception {
-  super.channelOpen(ctx, evt);
+  super.channelActive(ctx);
{code}
*2.2.2 Changes in 
org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* 
 Quoting the change again:
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
{code:java}
   LOG.error("Shuffle error: ", cause);
-  if 

[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4

2021-06-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850
 ] 

Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:38 PM:
--

Let me list the differences introduced because of the migration from Netty 3.x 
to 4.x.
 There is a migration guide that mentions most (but not all) of the changes: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html]
 Please note that the below code changes are based on Wei-Chiu's branch: 
[https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4]
h2. CHANGES IN ShuffleHandler
h3. *I will list the changes mostly from ShuffleHandler as it covers almost all 
type of changes in other classes as well.*

*In TestShuffleHandler, the test code was changed by any of the justifications 
listed down below.*
h3. Change category #1: General API changes / non-configuration getters:

Details: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes]
{quote}Non-configuration getters have no get- prefix anymore. (e.g. 
Channel.getRemoteAddress() → Channel.remoteAddress())
 Boolean properties are still prefixed with is- to avoid confusion (e.g. 
'empty' is both an adjective and a verb, so empty() can have two meanings.)
{quote}
I'm just listing all the changes without additional context (in which method 
they were changed) separated by three dots, as they are simply method renamings:
{code:java}
-future.getChannel().close();
+future.channel().closeFuture().awaitUninterruptibly();
...
...
-  ChannelPipeline pipeline = future.getChannel().getPipeline();
+  ChannelPipeline pipeline = future.channel().pipeline();
...
...
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
...
...
-  if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) {
-e.getChannel().close();
+  if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) {
+ctx.channel().close();
...
...
-  accepted.add(evt.getChannel());
+  accepted.add(ctx.channel());
...
...
-new QueryStringDecoder(request.getUri()).getParameters();
+new QueryStringDecoder(request.getUri()).parameters(); //getUri was 
not changed, see this later
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  reduceContext.getCtx().getChannel(),
+  reduceContext.getCtx().channel(),
...
...
-  if (ch.getPipeline().get(SslHandler.class) == null) {
+  if (ch.pipeline().get(SslHandler.class) == null) {
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  
ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE);
+  ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE);
...
...
-  Channel ch = e.getChannel();
-  Throwable cause = e.getCause();
+  Channel ch = ctx.channel();
{code}
h3. Change category #2: General API changes / Method signature changes.

*2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.*
 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound]
{quote}The terms 'upstream' and 'downstream' were pretty confusing to 
beginners. 4.0 uses 'inbound' and 'outbound' wherever possible.
{quote}
{code:java}
-  class Shuffle extends SimpleChannelUpstreamHandler {
+  @ChannelHandler.Sharable
+  class Shuffle extends ChannelInboundHandlerAdapter {
{code}
*2.2: Simplifed channel state model: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]*
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
*2.2.1 Changes in class: Shuffle*
{code:java}
 @Override
-public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) 
+public void channelActive(ChannelHandlerContext ctx)
 throws Exception {
-  super.channelOpen(ctx, evt);
+  super.channelActive(ctx);
{code}
*2.2.2 Changes in 
org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* 
 Quoting the change again:
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
{code:java}
   LOG.error("Shuffle error: ", cause);
-  if 

[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4

2021-06-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850
 ] 

Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:36 PM:
--

Let me list the differences introduced because of the migration from Netty 3.x 
to 4.x.
 There is a migration guide that mentions most (but not all) of the changes: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html]
 Please note that the below code changes are based on Wei-Chiu's branch: 
[https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4]
h2. CHANGES IN ShuffleHandler
h3. *I will list the changes mostly from ShuffleHandler as it covers almost all 
type of changes in other classes as well.*

*In TestShuffleHandler, the test code was changed by any of the justifications 
listed down below.*
h3. Change category #1: General API changes / non-configuration getters:

Details: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes]
{quote}Non-configuration getters have no get- prefix anymore. (e.g. 
Channel.getRemoteAddress() → Channel.remoteAddress())
 Boolean properties are still prefixed with is- to avoid confusion (e.g. 
'empty' is both an adjective and a verb, so empty() can have two meanings.)
{quote}
I'm just listing all the changes without additional context (in which method 
they were changed) separated by three dots, as they are simply method renamings:
{code:java}
-future.getChannel().close();
+future.channel().closeFuture().awaitUninterruptibly();
...
...
-  ChannelPipeline pipeline = future.getChannel().getPipeline();
+  ChannelPipeline pipeline = future.channel().pipeline();
...
...
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
...
...
-  if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) {
-e.getChannel().close();
+  if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) {
+ctx.channel().close();
...
...
-  accepted.add(evt.getChannel());
+  accepted.add(ctx.channel());
...
...
-new QueryStringDecoder(request.getUri()).getParameters();
+new QueryStringDecoder(request.getUri()).parameters(); //getUri was 
not changed, see this later
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  reduceContext.getCtx().getChannel(),
+  reduceContext.getCtx().channel(),
...
...
-  if (ch.getPipeline().get(SslHandler.class) == null) {
+  if (ch.pipeline().get(SslHandler.class) == null) {
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  
ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE);
+  ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE);
...
...
-  Channel ch = e.getChannel();
-  Throwable cause = e.getCause();
+  Channel ch = ctx.channel();
{code}
h3. Change category #2: General API changes / Method signature changes.

*2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.*
 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound]
{quote}The terms 'upstream' and 'downstream' were pretty confusing to 
beginners. 4.0 uses 'inbound' and 'outbound' wherever possible.
{quote}
{code:java}
-  class Shuffle extends SimpleChannelUpstreamHandler {
+  @ChannelHandler.Sharable
+  class Shuffle extends ChannelInboundHandlerAdapter {
{code}
*2.2: Simplifed channel state model: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]*
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
*2.2.1 Changes in class: Shuffle*
{code:java}
 @Override
-public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) 
+public void channelActive(ChannelHandlerContext ctx)
 throws Exception {
-  super.channelOpen(ctx, evt);
+  super.channelActive(ctx);
{code}
*2.2.2 Changes in 
org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* 
 Quoting the change again:
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
{code:java}
   LOG.error("Shuffle error: ", cause);
-  if 

[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4

2021-06-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850
 ] 

Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:29 PM:
--

Let me list the differences introduced because of the migration from Netty 3.x 
to 4.x.
 There is a migration guide that mentions most (but not all) of the changes: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html]
 Please note that the below code changes are based on Wei-Chiu's branch: 
[https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4]
h2. CHANGES IN ShuffleHandler
h3. *I will list the changes mostly from ShuffleHandler as it covers almost all 
type of changes in other classes as well.*

*In TestShuffleHandler, the test code was changed by any of the justifications 
listed down below.*
h3. Change category #1: General API changes / non-configuration getters:

Details: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes]
{quote}Non-configuration getters have no get- prefix anymore. (e.g. 
Channel.getRemoteAddress() → Channel.remoteAddress())
 Boolean properties are still prefixed with is- to avoid confusion (e.g. 
'empty' is both an adjective and a verb, so empty() can have two meanings.)
{quote}
I'm just listing all the changes without additional context (in which method 
they were changed) separated by three dots, as they are simply method renamings:
{code:java}
-future.getChannel().close();
+future.channel().closeFuture().awaitUninterruptibly();
...
...
-  ChannelPipeline pipeline = future.getChannel().getPipeline();
+  ChannelPipeline pipeline = future.channel().pipeline();
...
...
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
...
...
-  if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) {
-e.getChannel().close();
+  if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) {
+ctx.channel().close();
...
...
-  accepted.add(evt.getChannel());
+  accepted.add(ctx.channel());
...
...
-new QueryStringDecoder(request.getUri()).getParameters();
+new QueryStringDecoder(request.getUri()).parameters(); //getUri was 
not changed, see this later
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  reduceContext.getCtx().getChannel(),
+  reduceContext.getCtx().channel(),
...
...
-  if (ch.getPipeline().get(SslHandler.class) == null) {
+  if (ch.pipeline().get(SslHandler.class) == null) {
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  
ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE);
+  ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE);
...
...
-  Channel ch = e.getChannel();
-  Throwable cause = e.getCause();
+  Channel ch = ctx.channel();
{code}
h3. Change category #2: General API changes / Method signature changes.

*2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.*
 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound]
{quote}The terms 'upstream' and 'downstream' were pretty confusing to 
beginners. 4.0 uses 'inbound' and 'outbound' wherever possible.
{quote}
{code:java}
-  class Shuffle extends SimpleChannelUpstreamHandler {
+  @ChannelHandler.Sharable
+  class Shuffle extends ChannelInboundHandlerAdapter {
{code}
*2.2: Simplifed channel state model: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]*
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
*2.2.1 Changes in class: Shuffle*
{code:java}
 @Override
-public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) 
+public void channelActive(ChannelHandlerContext ctx)
 throws Exception {
-  super.channelOpen(ctx, evt);
+  super.channelActive(ctx);
{code}
*2.2.2 Changes in 
org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* 
 Quoting the change again:
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
{code:java}
   LOG.error("Shuffle error: ", cause);
-  if 

[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4

2021-06-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850
 ] 

Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:28 PM:
--

Let me list the differences introduced because of the migration from Netty 3.x 
to 4.x.
 There is a migration guide that mentions most (but not all) of the changes: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html]
 Please note that the below code changes are based on Wei-Chiu's branch: 
[https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4]
h2. CHANGES IN ShuffleHandler
h3. *I will list the changes mostly from ShuffleHandler as it covers almost all 
type of changes in other classes as well.*

*In TestShuffleHandler, the test code was changed by any of the justifications 
listed down below.*
h3. Change category #1: General API changes / non-configuration getters:

Details: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes]
{quote}Non-configuration getters have no get- prefix anymore. (e.g. 
Channel.getRemoteAddress() → Channel.remoteAddress())
 Boolean properties are still prefixed with is- to avoid confusion (e.g. 
'empty' is both an adjective and a verb, so empty() can have two meanings.)
{quote}
I'm just listing all the changes without additional context (in which method 
they were changed) separated by three dots, as they are simply method renamings:
{code:java}
-future.getChannel().close();
+future.channel().closeFuture().awaitUninterruptibly();
...
...
-  ChannelPipeline pipeline = future.getChannel().getPipeline();
+  ChannelPipeline pipeline = future.channel().pipeline();
...
...
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
...
...
-  if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) {
-e.getChannel().close();
+  if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) {
+ctx.channel().close();
...
...
-  accepted.add(evt.getChannel());
+  accepted.add(ctx.channel());
...
...
-new QueryStringDecoder(request.getUri()).getParameters();
+new QueryStringDecoder(request.getUri()).parameters(); //getUri was 
not changed, see this later
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  reduceContext.getCtx().getChannel(),
+  reduceContext.getCtx().channel(),
...
...
-  if (ch.getPipeline().get(SslHandler.class) == null) {
+  if (ch.pipeline().get(SslHandler.class) == null) {
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  
ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE);
+  ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE);
...
...
-  Channel ch = e.getChannel();
-  Throwable cause = e.getCause();
+  Channel ch = ctx.channel();
{code}
h3. Change category #2: General API changes / Method signature changes.

*2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.*
 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound]
{quote}The terms 'upstream' and 'downstream' were pretty confusing to 
beginners. 4.0 uses 'inbound' and 'outbound' wherever possible.
{quote}
{code:java}
-  class Shuffle extends SimpleChannelUpstreamHandler {
+  @ChannelHandler.Sharable
+  class Shuffle extends ChannelInboundHandlerAdapter {
{code}
*2.2: Simplifed channel state model: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]*
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
*2.2.1 Changes in class: Shuffle*
{code:java}
 @Override
-public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) 
+public void channelActive(ChannelHandlerContext ctx)
 throws Exception {
-  super.channelOpen(ctx, evt);
+  super.channelActive(ctx);
{code}
*2.2.2 Changes in 
org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* 
 Quoting the change again:
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
{code:java}
   LOG.error("Shuffle error: ", cause);
-  if 

[jira] [Comment Edited] (HADOOP-11219) [Umbrella] Upgrade to netty 4

2021-06-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850
 ] 

Szilard Nemeth edited comment on HADOOP-11219 at 6/7/21, 8:27 PM:
--

Let me list the differences introduced because of the migration from Netty 3.x 
to 4.x.
 There is a migration guide that mentions most (but not all) of the changes: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html]
 Please note that the below code changes are based on Wei-Chiu's branch: 
[https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4]
h2. CHANGES IN ShuffleHandler
h3. *I will list the changes mostly from ShuffleHandler as it covers almost all 
type of changes in other classes as well.*

*In TestShuffleHandler, the test code was changed by any of the justifications 
listed down below.*

Change category #1: General API changes / non-configuration getters:
 Details: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes]
{quote}Non-configuration getters have no get- prefix anymore. (e.g. 
Channel.getRemoteAddress() → Channel.remoteAddress())
 Boolean properties are still prefixed with is- to avoid confusion (e.g. 
'empty' is both an adjective and a verb, so empty() can have two meanings.)
{quote}
I'm just listing all the changes without additional context (in which method 
they were changed) separated by three dots, as they are simply method renamings:
{code:java}
-future.getChannel().close();
+future.channel().closeFuture().awaitUninterruptibly();
...
...
-  ChannelPipeline pipeline = future.getChannel().getPipeline();
+  ChannelPipeline pipeline = future.channel().pipeline();
...
...
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
...
...
-  if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) {
-e.getChannel().close();
+  if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) {
+ctx.channel().close();
...
...
-  accepted.add(evt.getChannel());
+  accepted.add(ctx.channel());
...
...
-new QueryStringDecoder(request.getUri()).getParameters();
+new QueryStringDecoder(request.getUri()).parameters(); //getUri was 
not changed, see this later
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  reduceContext.getCtx().getChannel(),
+  reduceContext.getCtx().channel(),
...
...
-  if (ch.getPipeline().get(SslHandler.class) == null) {
+  if (ch.pipeline().get(SslHandler.class) == null) {
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  
ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE);
+  ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE);
...
...
-  Channel ch = e.getChannel();
-  Throwable cause = e.getCause();
+  Channel ch = ctx.channel();
{code}
h3. Change category #2: General API changes / Method signature changes.

*2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.*
 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound]
{quote}The terms 'upstream' and 'downstream' were pretty confusing to 
beginners. 4.0 uses 'inbound' and 'outbound' wherever possible.
{quote}
{code:java}
-  class Shuffle extends SimpleChannelUpstreamHandler {
+  @ChannelHandler.Sharable
+  class Shuffle extends ChannelInboundHandlerAdapter {
{code}
*2.2: Simplifed channel state model: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]*
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
*2.2.1 Changes in class: Shuffle*
{code:java}
 @Override
-public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) 
+public void channelActive(ChannelHandlerContext ctx)
 throws Exception {
-  super.channelOpen(ctx, evt);
+  super.channelActive(ctx);
{code}
*2.2.2 Changes in 
org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* 
 Quoting the change again:
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
{code:java}
   LOG.error("Shuffle error: ", cause);
-  if (ch.isConnected()) {

[jira] [Commented] (HADOOP-11219) [Umbrella] Upgrade to netty 4

2021-06-07 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358850#comment-17358850
 ] 

Szilard Nemeth commented on HADOOP-11219:
-

Let me list the differences introduced because of the migration from Netty 3.x 
to 4.x.
 There is a migration guide that mentions most (but not all) of the changes: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html]
 Please note that the below code changes are based on Wei-Chiu's branch: 
[https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4]
h2. CHANGES IN ShuffleHandler
h3. *I will list the changes mostly from ShuffleHandler as it covers almost all 
type of changes in other classes as well.*
 *In TestShuffleHandler, the test code was changed by any of the justifications 
listed down below.*

Change category #1: General API changes / non-configuration getters:
Details: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#general-api-changes]
{quote}Non-configuration getters have no get- prefix anymore. (e.g. 
Channel.getRemoteAddress() → Channel.remoteAddress())
 Boolean properties are still prefixed with is- to avoid confusion (e.g. 
'empty' is both an adjective and a verb, so empty() can have two meanings.)
{quote}
I'm just listing all the changes without additional context (in which method 
they were changed) separated by three dots, as they are simply method renamings:
{code:java}
-future.getChannel().close();
+future.channel().closeFuture().awaitUninterruptibly();
...
...
-  ChannelPipeline pipeline = future.getChannel().getPipeline();
+  ChannelPipeline pipeline = future.channel().pipeline();
...
...
-port = ((InetSocketAddress)ch.getLocalAddress()).getPort();
+port = ((InetSocketAddress)ch.localAddress()).getPort();
...
...
-  if (e.getState() == IdleState.WRITER_IDLE && enabledTimeout) {
-e.getChannel().close();
+  if (e.state() == IdleState.WRITER_IDLE && enabledTimeout) {
+ctx.channel().close();
...
...
-  accepted.add(evt.getChannel());
+  accepted.add(ctx.channel());
...
...
-new QueryStringDecoder(request.getUri()).getParameters();
+new QueryStringDecoder(request.getUri()).parameters(); //getUri was 
not changed, see this later
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  reduceContext.getCtx().getChannel(),
+  reduceContext.getCtx().channel(),
...
...
-  if (ch.getPipeline().get(SslHandler.class) == null) {
+  if (ch.pipeline().get(SslHandler.class) == null) {
...
...
-  Channel ch = evt.getChannel();
-  ChannelPipeline pipeline = ch.getPipeline();
+  Channel ch = ctx.channel();
+  ChannelPipeline pipeline = ch.pipeline();
...
...
-  
ctx.getChannel().write(response).addListener(ChannelFutureListener.CLOSE);
+  ctx.channel().write(response).addListener(ChannelFutureListener.CLOSE);
...
...
-  Channel ch = e.getChannel();
-  Throwable cause = e.getCause();
+  Channel ch = ctx.channel();
{code}
h3. Change category #2: General API changes / Method signature changes.

*2.1: SimpleChannelUpstreamHandler was renamed to ChannelInboundHandlerAdapter.*
 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#upstream--inbound-downstream--outbound]
{quote}The terms 'upstream' and 'downstream' were pretty confusing to 
beginners. 4.0 uses 'inbound' and 'outbound' wherever possible.
{quote}
{code:java}
-  class Shuffle extends SimpleChannelUpstreamHandler {
+  @ChannelHandler.Sharable
+  class Shuffle extends ChannelInboundHandlerAdapter {
{code}
*2.2: Simplifed channel state model: 
[https://netty.io/wiki/new-and-noteworthy-in-4.0.html#simplified-channel-state-model]*
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
*2.2.1 Changes in class: Shuffle*
{code:java}
 @Override
-public void channelOpen(ChannelHandlerContext ctx, ChannelStateEvent evt) 
+public void channelActive(ChannelHandlerContext ctx)
 throws Exception {
-  super.channelOpen(ctx, evt);
+  super.channelActive(ctx);
{code}
*2.2.2 Changes in 
org.apache.hadoop.mapred.ShuffleHandler.Shuffle#exceptionCaught:* 
 Quoting the change again:
{quote}channelOpen, channelBound, and channelConnected have been merged to 
channelActive. channelDisconnected, channelUnbound, and channelClosed have been 
merged to channelInactive. Likewise, Channel.isBound() and isConnected() have 
been merged to isActive().
{quote}
{code:java}
   LOG.error("Shuffle error: ", cause);
-  if (ch.isConnected()) {
-LOG.error("Shuffle error " + e);
+  

[jira] [Comment Edited] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-04 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357627#comment-17357627
 ] 

Szilard Nemeth edited comment on HADOOP-15327 at 6/4/21, 9:04 PM:
--

Thanks [~weichiu],
Will use skipShade temporarily then will check how to resolve the shading issue 
once all code issues are fixed and in place.
Also thanks for your testing recommendations.


was (Author: snemeth):
Thanks [~weichiu],
Will use skipShade temporarily then will check how to resolve the shading issue 
once all code issues are fixed and in place.

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-04 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357627#comment-17357627
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

Thanks [~weichiu],
Will use skipShade temporarily then will check how to resolve the shading issue 
once all code issues are fixed and in place.

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work started] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-03 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-15327 started by Szilard Nemeth.
---
> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-03 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356401#comment-17356401
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

Hmm, can't attach the file, probably it's a permission issue. I don't want to 
paste 2000+ lines here.
Uploaded the file to my personal Google Drive: 
https://drive.google.com/file/d/1-ovH8snqTS73oLNsxtwgrvaDVynOBIP7/view?usp=sharing

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-03 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth reassigned HADOOP-15327:
---

Assignee: Szilard Nemeth

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Assignee: Szilard Nemeth
>Priority: Major
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15327) Upgrade MR ShuffleHandler to use Netty4

2021-06-03 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356377#comment-17356377
 ] 

Szilard Nemeth commented on HADOOP-15327:
-

Hi [~weichiu],

I took over this Jira as it was unassigned, I hope it's not a problem.

The plan is to continue the work based on your code changes until all the UT 
issues are fixed.

Did you have any plan for testing? Is it enough to test basic MR-based jobs 
with shuffling?

One more thing: The latest version on your branch 
([https://github.com/jojochuang/hadoop/commits/shuffle_handler_netty4)] 
produced Maven enforcer issues for me.

The command I'm using to build Hadoop (from its root): 
{code}
mvn clean install -Pdist -DskipTests  -Dmaven.javadoc.skip=true -e | tee 
/tmp/maven_out
{code}
Please see the attached output file. 

Did you encounter similar build issues? I can see the latest commit is form 
March 2021 so it's not that old so I don't assume the build system + enforcing 
rules were changed.
 

 

> Upgrade MR ShuffleHandler to use Netty4
> ---
>
> Key: HADOOP-15327
> URL: https://issues.apache.org/jira/browse/HADOOP-15327
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Xiaoyu Yao
>Priority: Major
>
> This way, we can remove the dependencies on the netty3 (jboss.netty)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11219) [Umbrella] Upgrade to netty 4

2021-04-22 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-11219:

Summary: [Umbrella] Upgrade to netty 4  (was: Upgrade to netty 4)

> [Umbrella] Upgrade to netty 4
> -
>
> Key: HADOOP-11219
> URL: https://issues.apache.org/jira/browse/HADOOP-11219
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Major
>
> This is an umbrella jira to track the effort of upgrading to Netty 4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException

2020-01-14 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-16683:

Fix Version/s: 3.2.2
   3.1.4
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped 
> AccessControlException
> --
>
> Key: HADOOP-16683
> URL: https://issues.apache.org/jira/browse/HADOOP-16683
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, 
> HADOOP-16683.003.patch, HADOOP-16683.branch-3.1.001.patch, 
> HADOOP-16683.branch-3.2.001.patch, HADOOP-16683.branch-3.2.001.patch
>
>
> Follow up patch on HADOOP-16580.
> We successfully disabled the retry in case of an AccessControlException which 
> has resolved some of the cases, but in other cases AccessControlException is 
> wrapped inside another IOException and you can only get the original 
> exception by calling getCause().
> Let's add this extra case as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException

2020-01-14 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014992#comment-17014992
 ] 

Szilard Nemeth commented on HADOOP-16683:
-

Thanks [~adam.antal],
Committed your patches to branch-3.2 and branch-3.1
Resolving jira.

> Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped 
> AccessControlException
> --
>
> Key: HADOOP-16683
> URL: https://issues.apache.org/jira/browse/HADOOP-16683
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, 
> HADOOP-16683.003.patch, HADOOP-16683.branch-3.1.001.patch, 
> HADOOP-16683.branch-3.2.001.patch, HADOOP-16683.branch-3.2.001.patch
>
>
> Follow up patch on HADOOP-16580.
> We successfully disabled the retry in case of an AccessControlException which 
> has resolved some of the cases, but in other cases AccessControlException is 
> wrapped inside another IOException and you can only get the original 
> exception by calling getCause().
> Let's add this extra case as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException

2019-11-09 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-16683:

Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped 
> AccessControlException
> --
>
> Key: HADOOP-16683
> URL: https://issues.apache.org/jira/browse/HADOOP-16683
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, 
> HADOOP-16683.003.patch
>
>
> Follow up patch on HADOOP-16580.
> We successfully disabled the retry in case of an AccessControlException which 
> has resolved some of the cases, but in other cases AccessControlException is 
> wrapped inside another IOException and you can only get the original 
> exception by calling getCause().
> Let's add this extra case as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16683) Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped AccessControlException

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970825#comment-16970825
 ] 

Szilard Nemeth commented on HADOOP-16683:
-

Thanks [~adam.antal] for this patch and [~pbacsko] for the review!
Just committed to trunk! Closing this jira as I don't think we need backports 
to other branches.
[~adam.antal]: If you think differently, please reopen this jira and set 
appropriate target versions. 
Thanks!

> Disable retry of FailoverOnNetworkExceptionRetry in case of wrapped 
> AccessControlException
> --
>
> Key: HADOOP-16683
> URL: https://issues.apache.org/jira/browse/HADOOP-16683
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: HADOOP-16683.001.patch, HADOOP-16683.002.patch, 
> HADOOP-16683.003.patch
>
>
> Follow up patch on HADOOP-16580.
> We successfully disabled the retry in case of an AccessControlException which 
> has resolved some of the cases, but in other cases AccessControlException is 
> wrapped inside another IOException and you can only get the original 
> exception by calling getCause().
> Let's add this extra case as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException

2019-11-09 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970824#comment-16970824
 ] 

Szilard Nemeth commented on HADOOP-16580:
-

Hi [~adam.antal]! 
Thanks for the update and for the link to the newly filed jira.

> Disable retry of FailoverOnNetworkExceptionRetry in case of 
> AccessControlException
> --
>
> Key: HADOOP-16580
> URL: https://issues.apache.org/jira/browse/HADOOP-16580
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch, 
> HADOOP-16580.003.patch, HADOOP-16580.branch-3.2.001.patch
>
>
> HADOOP-14982 handled the case where a SaslException is thrown. The issue 
> still persists, since the exception that is thrown is an 
> *AccessControlException* because user has no kerberos credentials. 
> My suggestion is that we should add this case as well to 
> {{FailoverOnNetworkExceptionRetry}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16510) [hadoop-common] Fix order of actual and expected expression in assert statements

2019-11-05 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967401#comment-16967401
 ] 

Szilard Nemeth commented on HADOOP-16510:
-

Sure [~adam.antal]!

> [hadoop-common] Fix order of actual and expected expression in assert 
> statements
> 
>
> Key: HADOOP-16510
> URL: https://issues.apache.org/jira/browse/HADOOP-16510
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16510.001.patch, HADOOP-16510.002.patch, 
> HADOOP-16510.003.patch
>
>
> Fix order of actual and expected expression in assert statements which gives 
> misleading message when test case fails. Attached file has some of the places 
> where it is placed wrongly.
> {code:java}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
> {code}
> For long term, [AssertJ|http://joel-costigliola.github.io/assertj/] can be 
> used for new test cases which avoids such mistakes.
> This is a follow-up jira for the hadoop-common project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16510) [hadoop-common] Fix order of actual and expected expression in assert statements

2019-10-31 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-16510:

Fix Version/s: 3.3.0

> [hadoop-common] Fix order of actual and expected expression in assert 
> statements
> 
>
> Key: HADOOP-16510
> URL: https://issues.apache.org/jira/browse/HADOOP-16510
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-16510.001.patch, HADOOP-16510.002.patch, 
> HADOOP-16510.003.patch
>
>
> Fix order of actual and expected expression in assert statements which gives 
> misleading message when test case fails. Attached file has some of the places 
> where it is placed wrongly.
> {code:java}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
> {code}
> For long term, [AssertJ|http://joel-costigliola.github.io/assertj/] can be 
> used for new test cases which avoids such mistakes.
> This is a follow-up jira for the hadoop-common project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16510) [hadoop-common] Fix order of actual and expected expression in assert statements

2019-10-31 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964016#comment-16964016
 ] 

Szilard Nemeth commented on HADOOP-16510:
-

Hi [~adam.antal]!
Thanks for this patch, very good job!

I found only 1 nit: I removed a commented out line in file: TestProtoBufRpc, 
from the imports:

{code:java}
// import org.junit.Assert;
{code}

+1, committing this to trunk!

[~adam.antal]: What about backporting this to branch-3.2 / branch-3.1?

Thanks!






> [hadoop-common] Fix order of actual and expected expression in assert 
> statements
> 
>
> Key: HADOOP-16510
> URL: https://issues.apache.org/jira/browse/HADOOP-16510
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: HADOOP-16510.001.patch, HADOOP-16510.002.patch, 
> HADOOP-16510.003.patch
>
>
> Fix order of actual and expected expression in assert statements which gives 
> misleading message when test case fails. Attached file has some of the places 
> where it is placed wrongly.
> {code:java}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
> {code}
> For long term, [AssertJ|http://joel-costigliola.github.io/assertj/] can be 
> used for new test cases which avoids such mistakes.
> This is a follow-up jira for the hadoop-common project.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException

2019-10-16 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952779#comment-16952779
 ] 

Szilard Nemeth commented on HADOOP-16580:
-

Hi [~adam.antal]!
Latest patch looks good, +1. Committed to trunk, branch-3.2 / branch-3.1.
Thanks [~pbacsko] and [~shuzirra] for the reviews!

> Disable retry of FailoverOnNetworkExceptionRetry in case of 
> AccessControlException
> --
>
> Key: HADOOP-16580
> URL: https://issues.apache.org/jira/browse/HADOOP-16580
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch, 
> HADOOP-16580.003.patch, HADOOP-16580.branch-3.2.001.patch
>
>
> HADOOP-14982 handled the case where a SaslException is thrown. The issue 
> still persists, since the exception that is thrown is an 
> *AccessControlException* because user has no kerberos credentials. 
> My suggestion is that we should add this case as well to 
> {{FailoverOnNetworkExceptionRetry}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException

2019-10-16 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-16580:

Hadoop Flags: Reviewed

> Disable retry of FailoverOnNetworkExceptionRetry in case of 
> AccessControlException
> --
>
> Key: HADOOP-16580
> URL: https://issues.apache.org/jira/browse/HADOOP-16580
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch, 
> HADOOP-16580.003.patch, HADOOP-16580.branch-3.2.001.patch
>
>
> HADOOP-14982 handled the case where a SaslException is thrown. The issue 
> still persists, since the exception that is thrown is an 
> *AccessControlException* because user has no kerberos credentials. 
> My suggestion is that we should add this case as well to 
> {{FailoverOnNetworkExceptionRetry}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException

2019-10-16 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-16580:

Fix Version/s: 3.2.2
   3.1.4
   3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Disable retry of FailoverOnNetworkExceptionRetry in case of 
> AccessControlException
> --
>
> Key: HADOOP-16580
> URL: https://issues.apache.org/jira/browse/HADOOP-16580
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch, 
> HADOOP-16580.003.patch, HADOOP-16580.branch-3.2.001.patch
>
>
> HADOOP-14982 handled the case where a SaslException is thrown. The issue 
> still persists, since the exception that is thrown is an 
> *AccessControlException* because user has no kerberos credentials. 
> My suggestion is that we should add this case as well to 
> {{FailoverOnNetworkExceptionRetry}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException

2019-10-16 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-16580:

Attachment: HADOOP-16580.branch-3.2.001.patch

> Disable retry of FailoverOnNetworkExceptionRetry in case of 
> AccessControlException
> --
>
> Key: HADOOP-16580
> URL: https://issues.apache.org/jira/browse/HADOOP-16580
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch, 
> HADOOP-16580.003.patch, HADOOP-16580.branch-3.2.001.patch
>
>
> HADOOP-14982 handled the case where a SaslException is thrown. The issue 
> still persists, since the exception that is thrown is an 
> *AccessControlException* because user has no kerberos credentials. 
> My suggestion is that we should add this case as well to 
> {{FailoverOnNetworkExceptionRetry}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16580) Disable retry of FailoverOnNetworkExceptionRetry in case of AccessControlException

2019-10-11 Thread Szilard Nemeth (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949408#comment-16949408
 ] 

Szilard Nemeth commented on HADOOP-16580:
-

Hi [~adam.antal]!

Thanks for the patch! Actually, I'm with [~shuzirra] on this one:

Without your excellent explanation, I wouldn't understand why the method is 
called failsWithAccessControlExceptionEightTimes.

As you mentioned: Could you please incorporate your explanation into javadoc, 
as much as possible? I don't only mean for the above method, but any other part 
of code you feel needs some explanation.

Apart from this, I could give a +1 for this, when you have the javadocs in 
place.

Thanks!

> Disable retry of FailoverOnNetworkExceptionRetry in case of 
> AccessControlException
> --
>
> Key: HADOOP-16580
> URL: https://issues.apache.org/jira/browse/HADOOP-16580
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: HADOOP-16580.001.patch, HADOOP-16580.002.patch
>
>
> HADOOP-14982 handled the case where a SaslException is thrown. The issue 
> still persists, since the exception that is thrown is an 
> *AccessControlException* because user has no kerberos credentials. 
> My suggestion is that we should add this case as well to 
> {{FailoverOnNetworkExceptionRetry}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException

2018-10-11 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647135#comment-16647135
 ] 

Szilard Nemeth commented on HADOOP-15717:
-

Thanks [~rkanter]!

> TGT renewal thread does not log IOException
> ---
>
> Key: HADOOP-15717
> URL: https://issues.apache.org/jira/browse/HADOOP-15717
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HADOOP-15717.001.patch, HADOOP-15717.002.patch
>
>
> I came across a case where tgt.getEndTime() was returned null and it resulted 
> in an NPE, this observation was popped out of a test suite execution on a 
> cluster. The reason for logging the {{IOException}} is that it helps to 
> troubleshoot what caused the exception, as it can come from two different 
> calls from the try-catch.
> I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
> logging the fact that the ticket's {{endDate}} was null, we have not logged 
> the exception at all.
> With the current code, the exception is swallowed and the thread terminates 
> in case the ticket's {{endDate}} is null. 
> As this can happen with OpenJDK for example, it is required to print the 
> exception (stack trace, message) to the log.
> The code should be updated here: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15676) Cleanup TestSSLHttpServer

2018-10-11 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647102#comment-16647102
 ] 

Szilard Nemeth commented on HADOOP-15676:
-

Thanks [~xiaochen]!

> Cleanup TestSSLHttpServer
> -
>
> Key: HADOOP-15676
> URL: https://issues.apache.org/jira/browse/HADOOP-15676
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 2.6.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, 
> HADOOP-15676.003.patch, HADOOP-15676.004.patch, HADOOP-15676.005.patch
>
>
> This issue will fix: 
> * Several typos in this class
> * Code is not very well readable in some of the places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException

2018-10-11 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646871#comment-16646871
 ] 

Szilard Nemeth commented on HADOOP-15717:
-

Hi [~xiaochen]!
I checked the API within my IDE and on the webpage as well, I still don't see a 
signature to log the parameters with parameterized logging plus log the 
exception properly.
The API provides either: 
1. 
https://www.slf4j.org/apidocs/org/slf4j/Logger.html#error(java.lang.String,%20java.lang.Object...)
or
2. 
https://www.slf4j.org/apidocs/org/slf4j/Logger.html#error(java.lang.String,%20java.lang.Throwable)

The 1. is for parameterized logging, the 2. for log a string and the exception 
but there's no method for do all the things combined.

> TGT renewal thread does not log IOException
> ---
>
> Key: HADOOP-15717
> URL: https://issues.apache.org/jira/browse/HADOOP-15717
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15717.001.patch, HADOOP-15717.002.patch
>
>
> I came across a case where tgt.getEndTime() was returned null and it resulted 
> in an NPE, this observation was popped out of a test suite execution on a 
> cluster. The reason for logging the {{IOException}} is that it helps to 
> troubleshoot what caused the exception, as it can come from two different 
> calls from the try-catch.
> I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
> logging the fact that the ticket's {{endDate}} was null, we have not logged 
> the exception at all.
> With the current code, the exception is swallowed and the thread terminates 
> in case the ticket's {{endDate}} is null. 
> As this can happen with OpenJDK for example, it is required to print the 
> exception (stack trace, message) to the log.
> The code should be updated here: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15676) Cleanup TestSSLHttpServer

2018-10-11 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15676:

Attachment: HADOOP-15676.005.patch

> Cleanup TestSSLHttpServer
> -
>
> Key: HADOOP-15676
> URL: https://issues.apache.org/jira/browse/HADOOP-15676
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 2.6.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, 
> HADOOP-15676.003.patch, HADOOP-15676.004.patch, HADOOP-15676.005.patch
>
>
> This issue will fix: 
> * Several typos in this class
> * Code is not very well readable in some of the places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15676) Cleanup TestSSLHttpServer

2018-10-11 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646865#comment-16646865
 ] 

Szilard Nemeth commented on HADOOP-15676:
-

Hi [~xiaochen]!
Thanks for the explanation, it is indeed better to just let the exception 
thrown out from the method so we have more information in the logs.
I'm with you when you mentioned improving the code if we touch it, since this 
is a cleanup jira, I think changing the things you mentioned makes sense.
See the new patch with the fixes!
Thanks!

> Cleanup TestSSLHttpServer
> -
>
> Key: HADOOP-15676
> URL: https://issues.apache.org/jira/browse/HADOOP-15676
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 2.6.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, 
> HADOOP-15676.003.patch, HADOOP-15676.004.patch, HADOOP-15676.005.patch
>
>
> This issue will fix: 
> * Several typos in this class
> * Code is not very well readable in some of the places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15676) Cleanup TestSSLHttpServer

2018-10-11 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15676:

Attachment: HADOOP-15676.004.patch

> Cleanup TestSSLHttpServer
> -
>
> Key: HADOOP-15676
> URL: https://issues.apache.org/jira/browse/HADOOP-15676
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 2.6.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, 
> HADOOP-15676.003.patch, HADOOP-15676.004.patch
>
>
> This issue will fix: 
> * Several typos in this class
> * Code is not very well readable in some of the places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15717) TGT renewal thread does not log IOException

2018-10-09 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15717:

Attachment: HADOOP-15717.002.patch

> TGT renewal thread does not log IOException
> ---
>
> Key: HADOOP-15717
> URL: https://issues.apache.org/jira/browse/HADOOP-15717
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15717.001.patch, HADOOP-15717.002.patch
>
>
> I came across a case where tgt.getEndTime() was returned null and it resulted 
> in an NPE, this observation was popped out of a test suite execution on a 
> cluster. The reason for logging the {{IOException}} is that it helps to 
> troubleshoot what caused the exception, as it can come from two different 
> calls from the try-catch.
> I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
> logging the fact that the ticket's {{endDate}} was null, we have not logged 
> the exception at all.
> With the current code, the exception is swallowed and the thread terminates 
> in case the ticket's {{endDate}} is null. 
> As this can happen with OpenJDK for example, it is required to print the 
> exception (stack trace, message) to the log.
> The code should be updated here: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException

2018-10-09 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644197#comment-16644197
 ] 

Szilard Nemeth commented on HADOOP-15717:
-

Hi [~xiaochen], [~rkanter]!
Oh I see what I had overlooked.
Removed the newly added error log and modified the 2 existing error logs to 
contain the exception.
Unfortunately, I had to use String.format, as there's no API from this version 
of log4j that would support object parameters and exception logging at the same 
time.
Actually, on line 945, the code's intention was to log the exception, but as 
the signature of the log4j API call is different, it was never logged. The call 
had less format specifiers in the string, too (4 instead of 5).

> TGT renewal thread does not log IOException
> ---
>
> Key: HADOOP-15717
> URL: https://issues.apache.org/jira/browse/HADOOP-15717
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15717.001.patch
>
>
> I came across a case where tgt.getEndTime() was returned null and it resulted 
> in an NPE, this observation was popped out of a test suite execution on a 
> cluster. The reason for logging the {{IOException}} is that it helps to 
> troubleshoot what caused the exception, as it can come from two different 
> calls from the try-catch.
> I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
> logging the fact that the ticket's {{endDate}} was null, we have not logged 
> the exception at all.
> With the current code, the exception is swallowed and the thread terminates 
> in case the ticket's {{endDate}} is null. 
> As this can happen with OpenJDK for example, it is required to print the 
> exception (stack trace, message) to the log.
> The code should be updated here: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15676) Cleanup TestSSLHttpServer

2018-09-16 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616893#comment-16616893
 ] 

Szilard Nemeth commented on HADOOP-15676:
-

Thanks [~xiaochen] for your comments!
Uploaded a new patch that fixes the code duplication.
With the try-catch block with the {{fail\(\)}} call, I haven't modified the 
original code.
I guess the intention was to not only fail when the {{SSLHandshakeException}} 
is thrown, the test should fail and provide a more detailed error message (1st 
parameter to {{fail\(\)}}. 
What idea do you have in mind to fix that?
Thanks!

> Cleanup TestSSLHttpServer
> -
>
> Key: HADOOP-15676
> URL: https://issues.apache.org/jira/browse/HADOOP-15676
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 2.6.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, 
> HADOOP-15676.003.patch
>
>
> This issue will fix: 
> * Several typos in this class
> * Code is not very well readable in some of the places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15676) Cleanup TestSSLHttpServer

2018-09-16 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15676:

Attachment: HADOOP-15676.003.patch

> Cleanup TestSSLHttpServer
> -
>
> Key: HADOOP-15676
> URL: https://issues.apache.org/jira/browse/HADOOP-15676
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 2.6.0
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: HADOOP-15676.001.patch, HADOOP-15676.002.patch, 
> HADOOP-15676.003.patch
>
>
> This issue will fix: 
> * Several typos in this class
> * Code is not very well readable in some of the places.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException

2018-09-16 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16616891#comment-16616891
 ] 

Szilard Nemeth commented on HADOOP-15717:
-

Hi [~xiaochen]!
Thanks for your comment!
I'm not sure what 2 existing {{LOG.error}} statements you referred to.
If you referred to the block {{catch \(IOException ie\) {}} in the {{run}} 
method, I would say those are not the best candidates to add the exception as a 
logging parameter, as the first {{LOG.error}} statement deals with the case 
when the {{tgt}} is destroyed, the second {{LOG.error}} statement handles 
possible NPEs coming from {{tgt.getEndTime\(\).getTime\(\)}}.
It can happen you meant a different thing, please clarify!
I'm still voting for the keeping my patch as a solution, so logging the 
exception on the first line of the catch-block in a generic fashion.

> TGT renewal thread does not log IOException
> ---
>
> Key: HADOOP-15717
> URL: https://issues.apache.org/jira/browse/HADOOP-15717
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15717.001.patch
>
>
> I came across a case where tgt.getEndTime() was returned null and it resulted 
> in an NPE, this observation was popped out of a test suite execution on a 
> cluster. The reason for logging the {{IOException}} is that it helps to 
> troubleshoot what caused the exception, as it can come from two different 
> calls from the try-catch.
> I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
> logging the fact that the ticket's {{endDate}} was null, we have not logged 
> the exception at all.
> With the current code, the exception is swallowed and the thread terminates 
> in case the ticket's {{endDate}} is null. 
> As this can happen with OpenJDK for example, it is required to print the 
> exception (stack trace, message) to the log.
> The code should be updated here: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException

2018-09-06 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605874#comment-16605874
 ] 

Szilard Nemeth commented on HADOOP-15717:
-

Hi [~ste...@apache.org]!
Good point.
Unfortunately, I'm not confident how many times this could be logged. 
In this sense, I would use the debug level instead. Do you agree?

> TGT renewal thread does not log IOException
> ---
>
> Key: HADOOP-15717
> URL: https://issues.apache.org/jira/browse/HADOOP-15717
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15717.001.patch
>
>
> I came across a case where tgt.getEndTime() was returned null and it resulted 
> in an NPE, this observation was popped out of a test suite execution on a 
> cluster. The reason for logging the {{IOException}} is that it helps to 
> troubleshoot what caused the exception, as it can come from two different 
> calls from the try-catch.
> I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
> logging the fact that the ticket's {{endDate}} was null, we have not logged 
> the exception at all.
> With the current code, the exception is swallowed and the thread terminates 
> in case the ticket's {{endDate}} is null. 
> As this can happen with OpenJDK for example, it is required to print the 
> exception (stack trace, message) to the log.
> The code should be updated here: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15717) TGT renewal thread does not log IOException

2018-09-04 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603308#comment-16603308
 ] 

Szilard Nemeth commented on HADOOP-15717:
-

Tests are not added, therefore we have the red build.

> TGT renewal thread does not log IOException
> ---
>
> Key: HADOOP-15717
> URL: https://issues.apache.org/jira/browse/HADOOP-15717
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15717.001.patch
>
>
> I came across a case where tgt.getEndTime() was returned null and it resulted 
> in an NPE, this observation was popped out of a test suite execution on a 
> cluster. The reason for logging the {{IOException}} is that it helps to 
> troubleshoot what caused the exception, as it can come from two different 
> calls from the try-catch.
> I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
> logging the fact that the ticket's {{endDate}} was null, we have not logged 
> the exception at all.
> With the current code, the exception is swallowed and the thread terminates 
> in case the ticket's {{endDate}} is null. 
> As this can happen with OpenJDK for example, it is required to print the 
> exception (stack trace, message) to the log.
> The code should be updated here: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15717) TGT renewal thread does not log IOException

2018-09-04 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15717:

Status: Patch Available  (was: In Progress)

> TGT renewal thread does not log IOException
> ---
>
> Key: HADOOP-15717
> URL: https://issues.apache.org/jira/browse/HADOOP-15717
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15717.001.patch
>
>
> I came across a case where tgt.getEndTime() was returned null and it resulted 
> in an NPE, this observation was popped out of a test suite execution on a 
> cluster. The reason for logging the {{IOException}} is that it helps to 
> troubleshoot what caused the exception, as it can come from two different 
> calls from the try-catch.
> I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
> logging the fact that the ticket's {{endDate}} was null, we have not logged 
> the exception at all.
> With the current code, the exception is swallowed and the thread terminates 
> in case the ticket's {{endDate}} is null. 
> As this can happen with OpenJDK for example, it is required to print the 
> exception (stack trace, message) to the log.
> The code should be updated here: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15717) TGT renewal thread does not log IOException

2018-09-04 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated HADOOP-15717:

Description: 
The reason for logging the IOexception is that it helps troubleshooting what 
caused the exception, as it can come from two different calls from the 
try-catch.
I came across a case where tgt.getEndTime() was returned null and it resulted 
in an NPE, this observation was popped out of a test suite execution on a 
cluster.
I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
logging the fact that the ticket's {{endDate}} was null, we have not logged the 
exception at all.
With the current code, the exception is swallowed and the thread terminates in 
case the ticket's {{endDate}} is null. 
As this can happen with OpenJDK for example, it is required to print the 
exception (stack trace, message) to the log.
The code should be updated here: 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918


  was:
The reason for logging the IOexception is that it helps troubleshooting what 
caused the exception, as it can come from two different calls from the 
try-catch.
I came across a case where tgt.getEndTime() was returned null and it resulted 
in an NPE.
I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
logging the fact that the ticket's {{endDate}} was null, we have not logged the 
exception at all.
With the current code, the exception is swallowed and the thread terminates in 
case the ticket's {{endDate}} is null. 
As this can happen with OpenJDK for example, it is required to print the 
exception (stack trace, message) to the log.

The code should be updated here: 
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918



> TGT renewal thread does not log IOException
> ---
>
> Key: HADOOP-15717
> URL: https://issues.apache.org/jira/browse/HADOOP-15717
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: HADOOP-15717.001.patch
>
>
> The reason for logging the IOexception is that it helps troubleshooting what 
> caused the exception, as it can come from two different calls from the 
> try-catch.
> I came across a case where tgt.getEndTime() was returned null and it resulted 
> in an NPE, this observation was popped out of a test suite execution on a 
> cluster.
> I can see that [~gabor.bota] handled this with HADOOP-15593, but apart from 
> logging the fact that the ticket's {{endDate}} was null, we have not logged 
> the exception at all.
> With the current code, the exception is swallowed and the thread terminates 
> in case the ticket's {{endDate}} is null. 
> As this can happen with OpenJDK for example, it is required to print the 
> exception (stack trace, message) to the log.
> The code should be updated here: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L918



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   >