[jira] [Commented] (HADOOP-14551) F3A init hangs if you try to connect while the system is offline
[ https://issues.apache.org/jira/browse/HADOOP-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057728#comment-16057728 ] Steve Loughran commented on HADOOP-14551: - Part of the problem appears to be that the standard AWS retry policy in {{SDKDefaultRetryCondition}} says "retry all IOEs"; we should have a list of those which you can't (UnknownHost, NoRouteToHost, other?) > F3A init hangs if you try to connect while the system is offline > > > Key: HADOOP-14551 > URL: https://issues.apache.org/jira/browse/HADOOP-14551 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.9.0 >Reporter: Steve Loughran >Priority: Minor > > F3A init hangs if you try to connect while the system is offline (that is: > the nost of the s3 endpoint is unknown) > Assumption: unknown host exception is considered recoverable & client is > spinning for a long time waiting for it. > I think we can conclude that unknown host is unrecoverable: if DNS is in > trouble, you are doomed. > Proposed: quick lookup of endpoint addr, fail with our wiki diagnostics error > on any problem. > I don't see any cost in doing this, as it will guarantee that the endpoint is > cached in the JVM ready for the AWS client connection. If it can't be found, > we'll fail within 20+s with something meaningful. > Noticed during a test run: laptop wifi off; all NICs other than loopback are > inactive. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14551) F3A init hangs if you try to connect while the system is offline
[ https://issues.apache.org/jira/browse/HADOOP-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055488#comment-16055488 ] Steve Loughran commented on HADOOP-14551: - OS gives up after 20s. (Notable: it could have failed faster. mDNS trying to ask around?) {code} time nslookup s3.amaazon.com ;; connection timed out; no servers could be reached 18.09 real 0.04 user 0.02 sys {code} No active NICs. {code} > ifconfig lo0: flags=8049mtu 16384 options=1203 inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 nd6 options=201 gif0: flags=8010 mtu 1280 stf0: flags=0<> mtu 1280 en0: flags=8823 mtu 1500 ether f4:5c:89:a3:82:99 nd6 options=201 media: autoselect () status: inactive en1: flags=963 mtu 1500 options=60 ether 6a:00:01:c6:4d:00 media: autoselect status: inactive en2: flags=963 mtu 1500 options=60 ether 6a:00:01:c6:4d:01 media: autoselect status: inactive p2p0: flags=8802 mtu 2304 ether 06:5c:89:a3:82:99 media: autoselect status: inactive bridge0: flags=8863 mtu 1500 options=63 ether 6a:00:01:c6:4d:00 Configuration: id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0 maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200 root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0 ipfilter disabled flags 0x2 member: en1 flags=3 ifmaxaddr 0 port 5 priority 0 path cost 0 member: en2 flags=3 ifmaxaddr 0 port 6 priority 0 path cost 0 nd6 options=201 media: status: inactive awdl0: flags=8902 mtu 1484 ether ba:d6:00:b9:42:05 nd6 options=201 media: autoselect status: inactive utun0: flags=8051 mtu 2000 inet6 fe80::bffe:5ddf:7506:15ca%utun0 prefixlen 64 scopeid 0xa nd6 options=201 {code} meanwhile, the S3 client retries, without seeming to fail the test. I got bored eventually {code} at java.lang.Thread.run(Thread.java:745) "Thread-0" #10 prio=5 os_prio=31 tid=0x7fc783b79800 nid=0x5703 waiting on condition [0x7ca4b000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doPauseBeforeRetry(AmazonHttpClient.java:1656) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.pauseBeforeRetry(AmazonHttpClient.java:1630) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1143) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4221) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4168) at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1306) at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1263) at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:305) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3258) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:123) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3307) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3275) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:476) at