[jira] [Updated] (HADOOP-17052) NetUtils.connect() throws an exception the prevents any retries when hostname resolution fails

2020-05-29 Thread Dhiraj Hegde (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Hegde updated HADOOP-17052:
--
Attachment: read_failure.log

> NetUtils.connect() throws an exception the prevents any retries when hostname 
> resolution fails
> --
>
> Key: HADOOP-17052
> URL: https://issues.apache.org/jira/browse/HADOOP-17052
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.10.0, 2.9.2, 3.2.1, 3.1.3
>Reporter: Dhiraj Hegde
>Assignee: Dhiraj Hegde
>Priority: Major
> Attachments: read_failure.log, write_failure1.log, write_failure2.log
>
>
> Hadoop components are increasingly being deployed on VMs and containers. One 
> aspect of this environment is that DNS is dynamic. Hostname records get 
> modified (or deleted/recreated) as a container in Kubernetes (or even VM) is 
> being created/recreated. In such dynamic environments, the initial DNS 
> resolution request might return resolution failure briefly as DNS client 
> doesn't always get the latest records. This has been observed in Kubernetes 
> in particular. In such cases NetUtils.connect() appears to throw 
> java.nio.channels.UnresolvedAddressException.  In much of Hadoop code (like 
> DFSInputStream and DFSOutputStream), the code is designed to retry 
> IOException. However, since UnresolvedAddressException is not child of 
> IOException, no retry happens and the code aborts immediately. It is much 
> better if NetUtils.connect() throws java.net.UnknownHostException as that is 
> derived from IOException and the code will treat this as a retry-able error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17052) NetUtils.connect() throws an exception the prevents any retries when hostname resolution fails

2020-05-29 Thread Dhiraj Hegde (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Hegde updated HADOOP-17052:
--
Attachment: write_failure2.log

> NetUtils.connect() throws an exception the prevents any retries when hostname 
> resolution fails
> --
>
> Key: HADOOP-17052
> URL: https://issues.apache.org/jira/browse/HADOOP-17052
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.10.0, 2.9.2, 3.2.1, 3.1.3
>Reporter: Dhiraj Hegde
>Assignee: Dhiraj Hegde
>Priority: Major
> Attachments: read_failure.log, write_failure1.log, write_failure2.log
>
>
> Hadoop components are increasingly being deployed on VMs and containers. One 
> aspect of this environment is that DNS is dynamic. Hostname records get 
> modified (or deleted/recreated) as a container in Kubernetes (or even VM) is 
> being created/recreated. In such dynamic environments, the initial DNS 
> resolution request might return resolution failure briefly as DNS client 
> doesn't always get the latest records. This has been observed in Kubernetes 
> in particular. In such cases NetUtils.connect() appears to throw 
> java.nio.channels.UnresolvedAddressException.  In much of Hadoop code (like 
> DFSInputStream and DFSOutputStream), the code is designed to retry 
> IOException. However, since UnresolvedAddressException is not child of 
> IOException, no retry happens and the code aborts immediately. It is much 
> better if NetUtils.connect() throws java.net.UnknownHostException as that is 
> derived from IOException and the code will treat this as a retry-able error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17052) NetUtils.connect() throws an exception the prevents any retries when hostname resolution fails

2020-05-29 Thread Dhiraj Hegde (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Hegde updated HADOOP-17052:
--
Attachment: write_failure1.log

> NetUtils.connect() throws an exception the prevents any retries when hostname 
> resolution fails
> --
>
> Key: HADOOP-17052
> URL: https://issues.apache.org/jira/browse/HADOOP-17052
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.10.0, 2.9.2, 3.2.1, 3.1.3
>Reporter: Dhiraj Hegde
>Assignee: Dhiraj Hegde
>Priority: Major
> Attachments: read_failure.log, write_failure1.log, write_failure2.log
>
>
> Hadoop components are increasingly being deployed on VMs and containers. One 
> aspect of this environment is that DNS is dynamic. Hostname records get 
> modified (or deleted/recreated) as a container in Kubernetes (or even VM) is 
> being created/recreated. In such dynamic environments, the initial DNS 
> resolution request might return resolution failure briefly as DNS client 
> doesn't always get the latest records. This has been observed in Kubernetes 
> in particular. In such cases NetUtils.connect() appears to throw 
> java.nio.channels.UnresolvedAddressException.  In much of Hadoop code (like 
> DFSInputStream and DFSOutputStream), the code is designed to retry 
> IOException. However, since UnresolvedAddressException is not child of 
> IOException, no retry happens and the code aborts immediately. It is much 
> better if NetUtils.connect() throws java.net.UnknownHostException as that is 
> derived from IOException and the code will treat this as a retry-able error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17052) NetUtils.connect() throws an exception the prevents any retries when hostname resolution fails

2020-05-29 Thread Dhiraj Hegde (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Hegde updated HADOOP-17052:
--
Attachment: (was: stack_trace2)

> NetUtils.connect() throws an exception the prevents any retries when hostname 
> resolution fails
> --
>
> Key: HADOOP-17052
> URL: https://issues.apache.org/jira/browse/HADOOP-17052
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.10.0, 2.9.2, 3.2.1, 3.1.3
>Reporter: Dhiraj Hegde
>Assignee: Dhiraj Hegde
>Priority: Major
>
> Hadoop components are increasingly being deployed on VMs and containers. One 
> aspect of this environment is that DNS is dynamic. Hostname records get 
> modified (or deleted/recreated) as a container in Kubernetes (or even VM) is 
> being created/recreated. In such dynamic environments, the initial DNS 
> resolution request might return resolution failure briefly as DNS client 
> doesn't always get the latest records. This has been observed in Kubernetes 
> in particular. In such cases NetUtils.connect() appears to throw 
> java.nio.channels.UnresolvedAddressException.  In much of Hadoop code (like 
> DFSInputStream and DFSOutputStream), the code is designed to retry 
> IOException. However, since UnresolvedAddressException is not child of 
> IOException, no retry happens and the code aborts immediately. It is much 
> better if NetUtils.connect() throws java.net.UnknownHostException as that is 
> derived from IOException and the code will treat this as a retry-able error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17052) NetUtils.connect() throws an exception the prevents any retries when hostname resolution fails

2020-05-26 Thread Dhiraj Hegde (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhiraj Hegde updated HADOOP-17052:
--
Attachment: stack_trace2

> NetUtils.connect() throws an exception the prevents any retries when hostname 
> resolution fails
> --
>
> Key: HADOOP-17052
> URL: https://issues.apache.org/jira/browse/HADOOP-17052
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.10.0, 2.9.2, 3.2.1, 3.1.3
>Reporter: Dhiraj Hegde
>Assignee: Dhiraj Hegde
>Priority: Major
> Attachments: stack_trace2
>
>
> Hadoop components are increasingly being deployed on VMs and containers. One 
> aspect of this environment is that DNS is dynamic. Hostname records get 
> modified (or deleted/recreated) as a container in Kubernetes (or even VM) is 
> being created/recreated. In such dynamic environments, the initial DNS 
> resolution request might return resolution failure briefly as DNS client 
> doesn't always get the latest records. This has been observed in Kubernetes 
> in particular. In such cases NetUtils.connect() appears to throw 
> java.nio.channels.UnresolvedAddressException.  In much of Hadoop code (like 
> DFSInputStream and DFSOutputStream), the code is designed to retry 
> IOException. However, since UnresolvedAddressException is not child of 
> IOException, no retry happens and the code aborts immediately. It is much 
> better if NetUtils.connect() throws java.net.UnknownHostException as that is 
> derived from IOException and the code will treat this as a retry-able error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org