[jira] [Commented] (HADOOP-14551) F3A init hangs if you try to connect while the system is offline

2017-06-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057728#comment-16057728
 ] 

Steve Loughran commented on HADOOP-14551:
-

Part of the problem appears to be that the standard AWS retry policy in 
{{SDKDefaultRetryCondition}} says "retry all IOEs"; we should have a list of 
those which you can't (UnknownHost, NoRouteToHost, other?)


> F3A init hangs if you try to connect while the system is offline
> 
>
> Key: HADOOP-14551
> URL: https://issues.apache.org/jira/browse/HADOOP-14551
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Priority: Minor
>
> F3A init hangs if you try to connect while the system is offline (that is: 
> the nost of the s3 endpoint is unknown)
> Assumption: unknown host exception is considered recoverable & client is 
> spinning for a long time waiting for it.
> I think we can conclude that unknown host is unrecoverable: if DNS is in 
> trouble, you are doomed.
> Proposed: quick lookup of endpoint addr, fail with our wiki diagnostics error 
> on any problem.
> I don't see any cost in doing this, as it will guarantee that the endpoint is 
> cached in the JVM ready for the AWS client connection. If it can't be found, 
> we'll fail within 20+s with something meaningful.
> Noticed during a test run: laptop wifi off; all NICs other than loopback are 
> inactive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14551) F3A init hangs if you try to connect while the system is offline

2017-06-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055488#comment-16055488
 ] 

Steve Loughran commented on HADOOP-14551:
-

OS gives up after 20s. (Notable: it could have failed faster. mDNS trying to 
ask around?)
{code}
time nslookup s3.amaazon.com
;; connection timed out; no servers could be reached

   18.09 real 0.04 user 0.02 sys
{code}

No active NICs. 

{code}
> ifconfig
lo0: flags=8049 mtu 16384
options=1203
inet 127.0.0.1 netmask 0xff00 
inet6 ::1 prefixlen 128 
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 
nd6 options=201
gif0: flags=8010 mtu 1280
stf0: flags=0<> mtu 1280
en0: flags=8823 mtu 1500
ether f4:5c:89:a3:82:99 
nd6 options=201
media: autoselect ()
status: inactive
en1: flags=963 mtu 1500
options=60
ether 6a:00:01:c6:4d:00 
media: autoselect 
status: inactive
en2: flags=963 mtu 1500
options=60
ether 6a:00:01:c6:4d:01 
media: autoselect 
status: inactive
p2p0: flags=8802 mtu 2304
ether 06:5c:89:a3:82:99 
media: autoselect
status: inactive
bridge0: flags=8863 mtu 1500
options=63
ether 6a:00:01:c6:4d:00 
Configuration:
id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
ipfilter disabled flags 0x2
member: en1 flags=3
ifmaxaddr 0 port 5 priority 0 path cost 0
member: en2 flags=3
ifmaxaddr 0 port 6 priority 0 path cost 0
nd6 options=201
media: 
status: inactive
awdl0: flags=8902 mtu 1484
ether ba:d6:00:b9:42:05 
nd6 options=201
media: autoselect
status: inactive
utun0: flags=8051 mtu 2000
inet6 fe80::bffe:5ddf:7506:15ca%utun0 prefixlen 64 scopeid 0xa 
nd6 options=201

{code}

meanwhile, the S3 client retries, without seeming to fail the test. I got bored 
eventually

{code}
at java.lang.Thread.run(Thread.java:745)

"Thread-0" #10 prio=5 os_prio=31 tid=0x7fc783b79800 nid=0x5703 waiting on 
condition [0x7ca4b000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doPauseBeforeRetry(AmazonHttpClient.java:1656)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.pauseBeforeRetry(AmazonHttpClient.java:1630)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1143)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4221)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4168)
at 
com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1306)
at 
com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1263)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:305)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3258)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3307)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3275)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476)
at