[jira] [Commented] (HADOOP-14770) S3A http connection in s3a driver not reuse in Spark application
[ https://issues.apache.org/jira/browse/HADOOP-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174816#comment-16174816 ] Steve Loughran commented on HADOOP-14770: - Good to hear. You should find IO numbers much better too > S3A http connection in s3a driver not reuse in Spark application > > > Key: HADOOP-14770 > URL: https://issues.apache.org/jira/browse/HADOOP-14770 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.3 >Reporter: Yonger >Assignee: Yonger >Priority: Minor > > I print out connection stats every 2 s when running Spark application against > s3-compatible storage: > {code} > ESTAB 0 0 :::10.0.2.36:6 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44454 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 159724 0 :::10.0.2.36:44436 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:8 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44338 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44438 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44414 > :::10.0.2.254:80 > ESTAB 0 480 :::10.0.2.36:44450 > :::10.0.2.254:80 timer:(on,170ms,0) > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44390 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44326 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44452 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44394 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:4 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44456 > :::10.0.2.254:80 > == > ESTAB 0 0 :::10.0.2.36:44508 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44476 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44524 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44500 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44504 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44512 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44506 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44464 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44518 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44510 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44526 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44472 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44466 > :::10.0.2.254:80 > {code} > the connection in the above of "=" and below were changed all the time. But > this haven't seen in MR application. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14770) S3A http connection in s3a driver not reuse in Spark application
[ https://issues.apache.org/jira/browse/HADOOP-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127218#comment-16127218 ] Yonger commented on HADOOP-14770: - Sorry, not yet. I am working with multiple partners on our big data cluster, so it's not easy to move to 2.8. But I will complete it ASAP. > S3A http connection in s3a driver not reuse in Spark application > > > Key: HADOOP-14770 > URL: https://issues.apache.org/jira/browse/HADOOP-14770 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.3 >Reporter: Yonger >Assignee: Yonger >Priority: Minor > > I print out connection stats every 2 s when running Spark application against > s3-compatible storage: > {code} > ESTAB 0 0 :::10.0.2.36:6 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44454 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 159724 0 :::10.0.2.36:44436 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:8 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44338 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44438 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44414 > :::10.0.2.254:80 > ESTAB 0 480 :::10.0.2.36:44450 > :::10.0.2.254:80 timer:(on,170ms,0) > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44390 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44326 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44452 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44394 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:4 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44456 > :::10.0.2.254:80 > == > ESTAB 0 0 :::10.0.2.36:44508 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44476 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44524 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44500 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44504 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44512 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44506 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44464 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44518 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44510 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44526 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44472 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44466 > :::10.0.2.254:80 > {code} > the connection in the above of "=" and below were changed all the time. But > this haven't seen in MR application. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14770) S3A http connection in s3a driver not reuse in Spark application
[ https://issues.apache.org/jira/browse/HADOOP-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127058#comment-16127058 ] Steve Loughran commented on HADOOP-14770: - Does moving to 2.8 fix this? If so, close as a duplicate of HADOOP-13202, thanks > S3A http connection in s3a driver not reuse in Spark application > > > Key: HADOOP-14770 > URL: https://issues.apache.org/jira/browse/HADOOP-14770 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.3 >Reporter: Yonger >Assignee: Yonger >Priority: Minor > > I print out connection stats every 2 s when running Spark application against > s3-compatible storage: > {code} > ESTAB 0 0 :::10.0.2.36:6 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44454 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 159724 0 :::10.0.2.36:44436 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:8 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44338 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44438 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44414 > :::10.0.2.254:80 > ESTAB 0 480 :::10.0.2.36:44450 > :::10.0.2.254:80 timer:(on,170ms,0) > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44390 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44326 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44452 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44394 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:4 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44456 > :::10.0.2.254:80 > == > ESTAB 0 0 :::10.0.2.36:44508 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44476 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44524 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44500 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44504 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44512 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44506 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44464 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44518 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44510 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44526 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44472 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44466 > :::10.0.2.254:80 > {code} > the connection in the above of "=" and below were changed all the time. But > this haven't seen in MR application. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14770) S3A http connection in s3a driver not reuse in Spark application
[ https://issues.apache.org/jira/browse/HADOOP-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125536#comment-16125536 ] Yonger commented on HADOOP-14770: - Thanks Steve, the application running on Hadoop 2.7.3 and against ORC file format. I will upgrade to Hadoop 2.8.0 to verify. > S3A http connection in s3a driver not reuse in Spark application > > > Key: HADOOP-14770 > URL: https://issues.apache.org/jira/browse/HADOOP-14770 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.3 >Reporter: Yonger >Assignee: Yonger > > I print out connection stats every 2 s when running Spark application against > s3-compatible storage: > {code} > ESTAB 0 0 :::10.0.2.36:6 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44454 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 159724 0 :::10.0.2.36:44436 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:8 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44338 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44438 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44414 > :::10.0.2.254:80 > ESTAB 0 480 :::10.0.2.36:44450 > :::10.0.2.254:80 timer:(on,170ms,0) > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44390 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44326 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44452 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44394 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:4 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44456 > :::10.0.2.254:80 > == > ESTAB 0 0 :::10.0.2.36:44508 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44476 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44524 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44500 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44504 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44512 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44506 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44464 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44518 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44510 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44526 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44472 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44466 > :::10.0.2.254:80 > {code} > the connection in the above of "=" and below were changed all the time. But > this haven't seen in MR application. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14770) S3A http connection in s3a driver not reuse in Spark application
[ https://issues.apache.org/jira/browse/HADOOP-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125526#comment-16125526 ] Steve Loughran commented on HADOOP-14770: - also, remember to set component as fs/s3. We need this categorisation to know whether to begin looking at the problem and where (i.e if it's just Hadoop 2.7, the fix is upgrade, if its 2.8+ then its a real issue) > S3A http connection in s3a driver not reuse in Spark application > > > Key: HADOOP-14770 > URL: https://issues.apache.org/jira/browse/HADOOP-14770 > Project: Hadoop Common > Issue Type: Bug >Reporter: Yonger >Assignee: Yonger > > I print out connection stats every 2 s when running Spark application against > s3-compatible storage: > ESTAB 0 0 :::10.0.2.36:6 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44454 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 159724 0 :::10.0.2.36:44436 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:8 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44338 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44438 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44414 > :::10.0.2.254:80 > ESTAB 0 480 :::10.0.2.36:44450 > :::10.0.2.254:80 timer:(on,170ms,0) > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44390 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44326 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44452 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44394 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:4 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44456 > :::10.0.2.254:80 > == > ESTAB 0 0 :::10.0.2.36:44508 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44476 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44524 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44500 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44504 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44512 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44506 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44464 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44518 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44510 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44526 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44472 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44466 > :::10.0.2.254:80 > the connection in the above of "=" and below were changed all the time. But > this haven't seen in MR application. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14770) S3A http connection in s3a driver not reuse in Spark application
[ https://issues.apache.org/jira/browse/HADOOP-14770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125524#comment-16125524 ] Steve Loughran commented on HADOOP-14770: - # add the Hadoop version to the JIRA, thanks # What is the file format? simple or columnar (ORC, Parquet) # Looks like the connection is being closed on every seek, which is a sign of HADOOP-13203 not engaging (random IO), or on a sequential read, forward reads aborting/reopening rather than skipping forward. Make sure you are using the Hadoop 2.8.x JARS, then: For columnar data: enabling random IO. {code} spark.hadoop.fs.s3a.experimental.fadvise=random {code} For sequential data with big forward skips {code} spark.hadoop.fs.s3a.readahead.range = 768K {code} If this fixes it, close as a duplicate of HADOOP-13203 If this doesn't fix it, you can print both the input stream and s3a FS, as their toString() ops print all their stats. Oh, one more possible cause: split calculation isn't getting it write. Look at your s3a block size, and the format itself. > S3A http connection in s3a driver not reuse in Spark application > > > Key: HADOOP-14770 > URL: https://issues.apache.org/jira/browse/HADOOP-14770 > Project: Hadoop Common > Issue Type: Bug >Reporter: Yonger >Assignee: Yonger > > I print out connection stats every 2 s when running Spark application against > s3-compatible storage: > ESTAB 0 0 :::10.0.2.36:6 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44454 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 159724 0 :::10.0.2.36:44436 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:8 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44338 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44438 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44414 > :::10.0.2.254:80 > ESTAB 0 480 :::10.0.2.36:44450 > :::10.0.2.254:80 timer:(on,170ms,0) > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44390 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44326 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44452 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44394 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:4 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44456 > :::10.0.2.254:80 > == > ESTAB 0 0 :::10.0.2.36:44508 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44476 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44524 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44374 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44500 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44504 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44512 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44506 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44464 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44518 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44510 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:2 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44526 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44472 > :::10.0.2.254:80 > ESTAB 0 0 :::10.0.2.36:44466 >