from:"Steve Loughran \(JIRA\)"

[jira] [Updated] (HADOOP-19074) Transitive dependencies with CVEs in Hadoop distro

2024-02-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19074:

Summary: Transitive dependencies with CVEs in Hadoop distro  (was: Long 
Standing High Risk CVE in Hadoop)

> Transitive dependencies with CVEs in Hadoop distro
> --
>
> Key: HADOOP-19074
> URL: https://issues.apache.org/jira/browse/HADOOP-19074
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Prathap Sagar S
>Priority: Major
> Attachments: HADOOP_CVE_LIST.xlsx
>
>
> Our ongoing security scans are turning up several long-standing CVEs, even in 
> the most recent version of Hadoop, which is making it difficult for us to use 
> Hadoop in our echo system. A comprehensive list of all the long-standing CVEs 
> and the JARs holding them is attached. I'm asking for community assistance to 
> address these high-risk vulnerabilities as soon as possible.
>  
> |Vulnerability ID|Severity|Package name|Package version|Package type|Package 
> path|Package suggested fix|
> |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|12.0.1|java|/hadoop-3.4.0/share/hadoop/yarn/timelineservice/lib/guava-12.0.1.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|27.0-jre|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/guava-27.0-jre.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|27.0-jre|java|/hadoop-3.4.0/share/hadoop/common/lib/guava-27.0-jre.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-guava-1.1.1.jar|v32.0.0-android|
> |CVE-2022-25647|High|com.google.code.gson:gson|2.8.5|java|/hadoop-3.4.0/share/hadoop/yarn/timelineservice/lib/hbase-shaded-gson-3.0.0.jar|v2.8.9|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar|v3.16.3|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/avro-1.9.2.jar|v1.11.3|
> |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v1.11.3|
> |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/common/lib/avro-1.9.2.jar|v1.11.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2021-22569|Medi

[jira] [Updated] (HADOOP-19074) Transitive dependencies with CVEs in Hadoop distro

2024-02-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19074:

Component/s: build

> Transitive dependencies with CVEs in Hadoop distro
> --
>
> Key: HADOOP-19074
> URL: https://issues.apache.org/jira/browse/HADOOP-19074
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.4.0
>Reporter: Prathap Sagar S
>Priority: Major
> Attachments: HADOOP_CVE_LIST.xlsx
>
>
> Our ongoing security scans are turning up several long-standing CVEs, even in 
> the most recent version of Hadoop, which is making it difficult for us to use 
> Hadoop in our echo system. A comprehensive list of all the long-standing CVEs 
> and the JARs holding them is attached. I'm asking for community assistance to 
> address these high-risk vulnerabilities as soon as possible.
>  
> |Vulnerability ID|Severity|Package name|Package version|Package type|Package 
> path|Package suggested fix|
> |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|12.0.1|java|/hadoop-3.4.0/share/hadoop/yarn/timelineservice/lib/guava-12.0.1.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|27.0-jre|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/guava-27.0-jre.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|27.0-jre|java|/hadoop-3.4.0/share/hadoop/common/lib/guava-27.0-jre.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-guava-1.1.1.jar|v32.0.0-android|
> |CVE-2022-25647|High|com.google.code.gson:gson|2.8.5|java|/hadoop-3.4.0/share/hadoop/yarn/timelineservice/lib/hbase-shaded-gson-3.0.0.jar|v2.8.9|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar|v3.16.3|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/avro-1.9.2.jar|v1.11.3|
> |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v1.11.3|
> |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/common/lib/avro-1.9.2.jar|v1.11.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2021-22569|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4

[jira] [Moved] (HADOOP-19074) Long Standing High Risk CVE in Hadoop

2024-02-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HDFS-17377 to HADOOP-19074:


  Key: HADOOP-19074  (was: HDFS-17377)
 Target Version/s: 3.4.1  (was: 3.4.0, Fine-Grained Locking)
Affects Version/s: 3.4.0
   (was: 3.4.0)
  Project: Hadoop Common  (was: Hadoop HDFS)

> Long Standing High Risk CVE in Hadoop
> -
>
> Key: HADOOP-19074
> URL: https://issues.apache.org/jira/browse/HADOOP-19074
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Prathap Sagar S
>Priority: Major
> Attachments: HADOOP_CVE_LIST.xlsx
>
>
> Our ongoing security scans are turning up several long-standing CVEs, even in 
> the most recent version of Hadoop, which is making it difficult for us to use 
> Hadoop in our echo system. A comprehensive list of all the long-standing CVEs 
> and the JARs holding them is attached. I'm asking for community assistance to 
> address these high-risk vulnerabilities as soon as possible.
>  
> |Vulnerability ID|Severity|Package name|Package version|Package type|Package 
> path|Package suggested fix|
> |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|12.0.1|java|/hadoop-3.4.0/share/hadoop/yarn/timelineservice/lib/guava-12.0.1.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|27.0-jre|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/guava-27.0-jre.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|27.0-jre|java|/hadoop-3.4.0/share/hadoop/common/lib/guava-27.0-jre.jar|v32.0.0-android|
> |CVE-2023-2976|High|com.google.guava:guava|30.1.1-jre|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-guava-1.1.1.jar|v32.0.0-android|
> |CVE-2022-25647|High|com.google.code.gson:gson|2.8.5|java|/hadoop-3.4.0/share/hadoop/yarn/timelineservice/lib/hbase-shaded-gson-3.0.0.jar|v2.8.9|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar|v3.16.3|
> |CVE-2022-3171|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3509|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2022-3510|High|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/avro-1.9.2.jar|v1.11.3|
> |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v1.11.3|
> |CVE-2023-39410|High|org.apache.avro:avro|1.9.2|java|/hadoop-3.4.0/share/hadoop/common/lib/avro-1.9.2.jar|v1.11.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/client/hadoop-client-runtime-3.4.0-SNAPSHOT.jar|v3.16.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|2.5.0|java|/hadoop-3.4.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar|v3.16.3|
> |CVE-2021-22570|Medium|com.google.protobuf:protobuf-java|3.7.1|java|/hadoop-3.4.0/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar|v3.16.3|
> |CVE-2021-22570|Medium|com.google.protob

[jira] [Updated] (HADOOP-19073) WASB: Fix connection leak in FolderRenamePending

2024-02-09 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19073:

Summary: WASB: Fix connection leak in FolderRenamePending  (was: Fix 
connection leak in FolderRenamePending)

> WASB: Fix connection leak in FolderRenamePending
> 
>
> Key: HADOOP-19073
> URL: https://issues.apache.org/jira/browse/HADOOP-19073
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.6
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> Fix connection leak in FolderRenamePending in getting bytes  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-19073) Fix connection leak in FolderRenamePending

2024-02-09 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-19073:
---

  Key: HADOOP-19073  (was: HDFS-17373)
Affects Version/s: 3.3.6
   (was: 3.3.6)
 Assignee: (was: xy)
   Issue Type: Bug  (was: Improvement)
  Project: Hadoop Common  (was: Hadoop HDFS)

> Fix connection leak in FolderRenamePending
> --
>
> Key: HADOOP-19073
> URL: https://issues.apache.org/jira/browse/HADOOP-19073
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.3.6
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> Fix connection leak in FolderRenamePending in getting bytes  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18980) S3A credential provider remapping: make extensible

2024-02-09 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-18980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816085#comment-17816085
 ] 

Steve Loughran commented on HADOOP-18980:
-

as commented in the backport pr, there are some tests of the k=v splitting we 
need, maybe some new policy there


# duplicate entries key=val1, key=val2. should the parser fail or just return 
the latest (as is done today)
# empty ,,
# =val
# key=
 the last two should fail, other ones, well, I see no problem with them passing.

> S3A credential provider remapping: make extensible
> --
>
> Key: HADOOP-18980
> URL: https://issues.apache.org/jira/browse/HADOOP-18980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3afs will now remap the common com.amazonaws credential providers to 
> equivalents in the v2 sdk or in hadoop-aws
> We could do the same for third party credential providers by taking a 
> key=value list in a configuration property and adding to the map. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19073) Fix connection leak in FolderRenamePending

2024-02-09 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19073:

Component/s: fs/azure

> Fix connection leak in FolderRenamePending
> --
>
> Key: HADOOP-19073
> URL: https://issues.apache.org/jira/browse/HADOOP-19073
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.6
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> Fix connection leak in FolderRenamePending in getting bytes  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18980) S3A credential provider remapping: make extensible

2024-02-09 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18980:

Fix Version/s: 3.4.1

> S3A credential provider remapping: make extensible
> --
>
> Key: HADOOP-18980
> URL: https://issues.apache.org/jira/browse/HADOOP-18980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3afs will now remap the common com.amazonaws credential providers to 
> equivalents in the v2 sdk or in hadoop-aws
> We could do the same for third party credential providers by taking a 
> key=value list in a configuration property and adding to the map. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19069) Use hadoop-thirdparty 1.2.0

2024-02-09 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19069:

Fix Version/s: 3.3.9

> Use hadoop-thirdparty 1.2.0
> ---
>
> Key: HADOOP-19069
> URL: https://issues.apache.org/jira/browse/HADOOP-19069
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: hadoop-thirdparty
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9, 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19059) update AWS SDK to support S3 Access Grants in S3A

2024-02-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19059.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> update AWS SDK to support S3 Access Grants in S3A
> -
>
> Key: HADOOP-19059
> URL: https://issues.apache.org/jira/browse/HADOOP-19059
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Assignee: Jason Han
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In order to support S3 Access 
> Grants(https://aws.amazon.com/s3/features/access-grants/) in S3A, we need to 
> update AWS SDK in hadooop package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19059) S3A: update AWS SDK to 2.23.19 to support S3 Access Grants

2024-02-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19059:

Summary: S3A: update AWS SDK to 2.23.19 to support S3 Access Grants  (was: 
update AWS SDK to support S3 Access Grants in S3A)

> S3A: update AWS SDK to 2.23.19 to support S3 Access Grants
> --
>
> Key: HADOOP-19059
> URL: https://issues.apache.org/jira/browse/HADOOP-19059
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Assignee: Jason Han
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In order to support S3 Access 
> Grants(https://aws.amazon.com/s3/features/access-grants/) in S3A, we need to 
> update AWS SDK in hadooop package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18830) S3A: Cut S3 Select

2024-02-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18830:

Fix Version/s: 3.3.9

> S3A: Cut S3 Select
> --
>
> Key: HADOOP-18830
> URL: https://issues.apache.org/jira/browse/HADOOP-18830
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> getting s3 select to work with the v2 sdk is tricky, we need to add extra 
> libraries to the classpath beyond just bundle.jar. we can do this but
> * AFAIK nobody has ever done CSV predicate pushdown, as it breaks split logic 
> completely
> * CSV is a bad format
> * one-line JSON more structured but also way less efficient
> ORC/Parquet benefit from vectored IO and work spanning the cluster.
> accordingly, I'm wondering what to do about s3 select
> # cut?
> # downgrade to optional and document the extra classes on the classpath
> Option #2 is straightforward and effectively the default. we can also declare 
> the feature deprecated.
> {code}
> [ERROR] 
> testReadLandsatRecordsNoMatch(org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat)
>   Time elapsed: 147.958 s  <<< ERROR!
> java.io.IOException: java.lang.NoClassDefFoundError: 
> software/amazon/eventstream/MessageDecoder
> at 
> org.apache.hadoop.fs.s3a.select.SelectObjectContentHelper.select(SelectObjectContentHelper.java:75)
> at 
> org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$select$10(WriteOperationHelper.java:660)
> at 
> org.apache.hadoop.fs.store.audit.AuditingFunctions.lambda$withinAuditSpan$0(AuditingFunctions.java:62)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19071) Update maven-surefire-plugin from 3.0.0 to 3.2.2

2024-02-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19071:

Component/s: build

> Update maven-surefire-plugin from 3.0.0 to 3.2.2  
> -
>
> Key: HADOOP-19071
> URL: https://issues.apache.org/jira/browse/HADOOP-19071
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, common
>Affects Versions: 3.5.0
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-19057) S3 public test bucket landsat-pds unreadable -needs replacement

2024-02-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-19057:
---

Assignee: Steve Loughran

> S3 public test bucket landsat-pds unreadable -needs replacement
> ---
>
> Key: HADOOP-19057
> URL: https://issues.apache.org/jira/browse/HADOOP-19057
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0, 3.2.4, 3.3.9, 3.3.6, 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>  Labels: pull-request-available
>
> The s3 test bucket used in hadoop-aws tests of S3 select and large file reads 
> is no longer publicly accessible
> {code}
> java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on 
> landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null 
> (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended 
> Request ID: 
> O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null
> {code}
> * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large 
> file for some reading tests
> * changing the default value disables s3 select tests on older releases
> * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it 
> will be skipped
> Proposed
> * we locate a new large file under the (requester pays) s3a://usgs-landsat/ 
> bucket . All releases with HADOOP-18168 can use this
> * update 3.4.1 source to use this; document it
> * do something similar for 3.3.9 + maybe even cut s3 select there too.
> * document how to use it on older releases with requester-pays support
> * document how to completely disable it on older releases.
> h2. How to fix (most) landsat test failures on older releases
> add this to your auth-keys.xml file. Expect some failures in a few tests 
> with-hardcoded references to the bucket (assumed role delegation tokens)
> {code}
>   
> fs.s3a.scale.test.csvfile
> s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz
> file used in scale tests
>   
>   
> fs.s3a.bucket.noaa-cors-pds.endpoint.region
> us-east-1
>   
>   
> fs.s3a.bucket.noaa-isd-pds.multipart.purge
> false
> Don't try to purge uploads in the read-only bucket, as
> it will only create log noise.
>   
>   
> fs.s3a.bucket.noaa-isd-pds.probe
> 0
> Let's postpone existence checks to the first IO operation 
> 
>   
>   
> fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header
> false
> Do not add the referrer header
>   
>   
> fs.s3a.bucket.noaa-isd-pds.prefetch.block.size
> 128k
> Use a small prefetch size so tests fetch multiple 
> blocks
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19072) S3A: expand optimisations on stores with "fs.s3a.create.performance"

2024-02-08 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19072:
---

 Summary: S3A: expand optimisations on stores with 
"fs.s3a.create.performance"
 Key: HADOOP-19072
 URL: https://issues.apache.org/jira/browse/HADOOP-19072
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


on an s3a store with fs.s3a.create.performance set, speed up other operations

*  mkdir to skip parent directory check: just do a HEAD to see if there's a 
file at the target location




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18886) S3A: AWS SDK V2 Migration: stabilization and S3Express

2024-02-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18886:

Target Version/s: 3.4.1  (was: 3.5.0)

> S3A: AWS SDK V2 Migration: stabilization and S3Express
> --
>
> Key: HADOOP-18886
> URL: https://issues.apache.org/jira/browse/HADOOP-18886
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: ahmar#1
>Priority: Major
>
> The final stabilisation changes to the V2 SDK MIgration; those moved off the 
> HADOOP-18073 JIRA so we can close that.
> also adds support to Amazon S3 Express One Zone storage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18996) S3A to provide full support for S3 Express One Zone

2024-02-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18996:

Release Note: Hadoop S3A connector has explicit awareness of and support 
for S3Express storage.A filesystem can now be probed for inconsistent 
directoriy listings through fs.hasPathCapability(path, 
"fs.capability.directory.listing.inconsistent"). If true, then treewalking code 
SHOULD NOT report a failure if, when walking into a subdirectory, a 
list/getFileStatus on that directory raises a FileNotFoundException.  (was: 
Hadoop S3A connector has explicit awareness of and support for S3Express 
storage. Hadoop-common and hadoop-mapreduce treewalking code (shell commands, 
FileInputFormat,...) can cope with paths with incomplete uploads being visible. 
)

> S3A to provide full support for S3 Express One Zone
> ---
>
> Key: HADOOP-18996
> URL: https://issues.apache.org/jira/browse/HADOOP-18996
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Ahmar Suhail
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.7-aws
>
>
> HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express 
> one zone support. 
> Complete support needs to be added to address tests that fail with s3 express 
> one zone, additional tests, documentation etc. 
> * hadoop-common path capability to indicate that treewalking may encounter 
> missing dirs
> * use this in treewalking code in shell, mapreduce FileInputFormat etc to not 
> fail during treewalks
> * extra path capability for s3express too.
> * tests for this
> * anything else
> A filesystem can now be probed for inconsistent directoriy listings through 
> {{fs.hasPathCapability(path, "fs.capability.directory.listing.inconsistent")}}
> If true, then treewalking code SHOULD NOT report a failure if, when walking 
> into a subdirectory, a list/getFileStatus on that directory raises a 
> FileNotFoundExceptin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18996) S3A to provide full support for S3 Express One Zone

2024-02-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18996:

Description: 
HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express 
one zone support. 

Complete support needs to be added to address tests that fail with s3 express 
one zone, additional tests, documentation etc. 

* hadoop-common path capability to indicate that treewalking may encounter 
missing dirs
* use this in treewalking code in shell, mapreduce FileInputFormat etc to not 
fail during treewalks
* extra path capability for s3express too.
* tests for this
* anything else

A filesystem can now be probed for inconsistent directoriy listings through 
{{fs.hasPathCapability(path, "fs.capability.directory.listing.inconsistent")}}

If true, then treewalking code SHOULD NOT report a failure if, when walking 
into a subdirectory, a list/getFileStatus on that directory raises a 
FileNotFoundExceptin.


  was:
HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express 
one zone support. 

Complete support needs to be added to address tests that fail with s3 express 
one zone, additional tests, documentation etc. 

* hadoop-common path capability to indicate that treewalking may encounter 
missing dirs
* use this in treewalking code in shell, mapreduce FileInputFormat etc to not 
fail during treewalks
* extra path capability for s3express too.
* tests for this
* anything else


> S3A to provide full support for S3 Express One Zone
> ---
>
> Key: HADOOP-18996
> URL: https://issues.apache.org/jira/browse/HADOOP-18996
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Ahmar Suhail
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.7-aws
>
>
> HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express 
> one zone support. 
> Complete support needs to be added to address tests that fail with s3 express 
> one zone, additional tests, documentation etc. 
> * hadoop-common path capability to indicate that treewalking may encounter 
> missing dirs
> * use this in treewalking code in shell, mapreduce FileInputFormat etc to not 
> fail during treewalks
> * extra path capability for s3express too.
> * tests for this
> * anything else
> A filesystem can now be probed for inconsistent directoriy listings through 
> {{fs.hasPathCapability(path, "fs.capability.directory.listing.inconsistent")}}
> If true, then treewalking code SHOULD NOT report a failure if, when walking 
> into a subdirectory, a list/getFileStatus on that directory raises a 
> FileNotFoundExceptin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19045) HADOOP-19045. S3A: CreateSession Timeout after 10 seconds

2024-02-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19045:

Description: 
s3a client timeout settings are getting down to http client, but not sdk 
timeouts, so you can't have a longer timeout than the default. This surfaces in 
the inability to tune the timeouts for CreateSession calls even now the latest 
SDK does pick it up

The default value of {{fs.s3a.connection.request.timeout}} is now 60s; if an S3 
store takes longer than this to return then the operation will be reported by 
the SDK as a timeout.

When cherrypicking there are two patches
* change constants.java and add test for passdown
* remove core-default and test/core-site values of 0 and update docs. Without 
this (or a config override) the 10s timeout is maintained

  was:
s3a client timeout settings are getting down to http client, but not sdk 
timeouts, so you can't have a longer timeout than the default. This surfaces in 
the inability to tune the timeouts for CreateSession calls even now the latest 
SDK does pick it up

The default value of {{fs.s3a.connection.request.timeout}} is now 60s; if an S3 
store takes longer than this to return then the operation will be reported by 
the SDK as a timeout.


> HADOOP-19045. S3A: CreateSession Timeout after 10 seconds
> -
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up
> The default value of {{fs.s3a.connection.request.timeout}} is now 60s; if an 
> S3 store takes longer than this to return then the operation will be reported 
> by the SDK as a timeout.
> When cherrypicking there are two patches
> * change constants.java and add test for passdown
> * remove core-default and test/core-site values of 0 and update docs. Without 
> this (or a config override) the 10s timeout is maintained



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19045) HADOOP-19045. S3A: CreateSession Timeout after 10 seconds

2024-02-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19045:

Summary: HADOOP-19045. S3A: CreateSession Timeout after 10 seconds  (was: 
S3A: pass request timeouts down to sdk clients)

> HADOOP-19045. S3A: CreateSession Timeout after 10 seconds
> -
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up
> The default value of {{fs.s3a.connection.request.timeout}} is now 60s; if an 
> S3 store takes longer than this to return then the operation will be reported 
> by the SDK as a timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-02-07 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19045:

Description: 
s3a client timeout settings are getting down to http client, but not sdk 
timeouts, so you can't have a longer timeout than the default. This surfaces in 
the inability to tune the timeouts for CreateSession calls even now the latest 
SDK does pick it up

The default value of {{fs.s3a.connection.request.timeout}} is now 60s; if an S3 
store takes longer than this to return then the operation will be reported by 
the SDK as a timeout.

  was:s3a client timeout settings are getting down to http client, but not sdk 
timeouts, so you can't have a longer timeout than the default. This surfaces in 
the inability to tune the timeouts for CreateSession calls even now the latest 
SDK does pick it up


> S3A: pass request timeouts down to sdk clients
> --
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up
> The default value of {{fs.s3a.connection.request.timeout}} is now 60s; if an 
> S3 store takes longer than this to return then the operation will be reported 
> by the SDK as a timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-02-07 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19045.
-
Resolution: Fixed

> S3A: pass request timeouts down to sdk clients
> --
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18993) S3A: Add option fs.s3a.classloader.isolation (#6301)

2024-02-07 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18993:

Description: 
In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be 
the same as the one that loaded S3AFileSystem. This leads to the impossibility 
in Spark applications to load third party credentials providers as user jars.


The option fs.s3a.classloader.isolation (default: true) can be set to false to 
disable s3a classloader isolation;

This can assist in using custom credential providers and other extension points.




  was:
In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be 
the same as the one that loaded S3AFileSystem. This leads to the impossibility 
in Spark applications to load third party credentials providers as user jars.

I propose to add a configuration key {{fs.s3a.extensions.isolated.classloader}} 
with a default value of {{true}} that if set to {{false}} will not perform the 
classloader set.


> S3A: Add option fs.s3a.classloader.isolation (#6301)
> 
>
> Key: HADOOP-18993
> URL: https://issues.apache.org/jira/browse/HADOOP-18993
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hadoop-thirdparty
>Affects Versions: 3.3.6
>Reporter: Antonio Murgia
>Assignee: Antonio Murgia
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be 
> the same as the one that loaded S3AFileSystem. This leads to the 
> impossibility in Spark applications to load third party credentials providers 
> as user jars.
> The option fs.s3a.classloader.isolation (default: true) can be set to false 
> to disable s3a classloader isolation;
> This can assist in using custom credential providers and other extension 
> points.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18993) S3A: Add option fs.s3a.classloader.isolation (#6301)

2024-02-07 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18993:

Component/s: fs/s3
 (was: hadoop-thirdparty)

> S3A: Add option fs.s3a.classloader.isolation (#6301)
> 
>
> Key: HADOOP-18993
> URL: https://issues.apache.org/jira/browse/HADOOP-18993
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Antonio Murgia
>Assignee: Antonio Murgia
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be 
> the same as the one that loaded S3AFileSystem. This leads to the 
> impossibility in Spark applications to load third party credentials providers 
> as user jars.
> The option fs.s3a.classloader.isolation (default: true) can be set to false 
> to disable s3a classloader isolation;
> This can assist in using custom credential providers and other extension 
> points.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18993) S3A: Add option fs.s3a.classloader.isolation (#6301)

2024-02-07 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18993:

Summary: S3A: Add option fs.s3a.classloader.isolation (#6301)  (was: Allow 
to not isolate S3AFileSystem classloader when needed)

> S3A: Add option fs.s3a.classloader.isolation (#6301)
> 
>
> Key: HADOOP-18993
> URL: https://issues.apache.org/jira/browse/HADOOP-18993
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hadoop-thirdparty
>Affects Versions: 3.3.6
>Reporter: Antonio Murgia
>Assignee: Antonio Murgia
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be 
> the same as the one that loaded S3AFileSystem. This leads to the 
> impossibility in Spark applications to load third party credentials providers 
> as user jars.
> I propose to add a configuration key 
> {{fs.s3a.extensions.isolated.classloader}} with a default value of {{true}} 
> that if set to {{false}} will not perform the classloader set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18993) Allow to not isolate S3AFileSystem classloader when needed

2024-02-07 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18993:

Fix Version/s: 3.4.1

> Allow to not isolate S3AFileSystem classloader when needed
> --
>
> Key: HADOOP-18993
> URL: https://issues.apache.org/jira/browse/HADOOP-18993
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hadoop-thirdparty
>Affects Versions: 3.3.6
>Reporter: Antonio Murgia
>Assignee: Antonio Murgia
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be 
> the same as the one that loaded S3AFileSystem. This leads to the 
> impossibility in Spark applications to load third party credentials providers 
> as user jars.
> I propose to add a configuration key 
> {{fs.s3a.extensions.isolated.classloader}} with a default value of {{true}} 
> that if set to {{false}} will not perform the classloader set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19057) S3 public test bucket landsat-pds unreadable -needs replacement

2024-02-07 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19057:

Description: 
The s3 test bucket used in hadoop-aws tests of S3 select and large file reads 
is no longer publicly accessible

{code}
java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on 
landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null 
(Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended Request 
ID: 
O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null

{code}

* Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large file 
for some reading tests
* changing the default value disables s3 select tests on older releases
* if fs.s3a.scale.test.csvfile is set to " " then other tests which need it 
will be skipped

Proposed
* we locate a new large file under the (requester pays) s3a://usgs-landsat/ 
bucket . All releases with HADOOP-18168 can use this
* update 3.4.1 source to use this; document it
* do something similar for 3.3.9 + maybe even cut s3 select there too.
* document how to use it on older releases with requester-pays support
* document how to completely disable it on older releases.

h2. How to fix (most) landsat test failures on older releases

add this to your auth-keys.xml file. Expect some failures in a few tests 
with-hardcoded references to the bucket (assumed role delegation tokens)

{code}

  
fs.s3a.scale.test.csvfile
s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz
file used in scale tests
  

  
fs.s3a.bucket.noaa-cors-pds.endpoint.region
us-east-1
  

  
fs.s3a.bucket.noaa-isd-pds.multipart.purge
false
Don't try to purge uploads in the read-only bucket, as
it will only create log noise.
  

  
fs.s3a.bucket.noaa-isd-pds.probe
0
Let's postpone existence checks to the first IO operation 

  

  
fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header
false
Do not add the referrer header
  

  
fs.s3a.bucket.noaa-isd-pds.prefetch.block.size
128k
Use a small prefetch size so tests fetch multiple 
blocks
  


{code}


  was:
The s3 test bucket used in hadoop-aws tests of S3 select and large file reads 
is no longer publicly accessible

{code}
java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on 
landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null 
(Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended Request 
ID: 
O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null

{code}

* Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large file 
for some reading tests
* changing the default value disables s3 select tests on older releases
* if fs.s3a.scale.test.csvfile is set to " " then other tests which need it 
will be skipped

Proposed
* we locate a new large file under the (requester pays) s3a://usgs-landsat/ 
bucket . All releases with HADOOP-18168 can use this
* update 3.4.1 source to use this; document it
* do something similar for 3.3.9 + maybe even cut s3 select there too.
* document how to use it on older releases with requester-pays support
* document how to completely disable it on older releases.


> S3 public test bucket landsat-pds unreadable -needs replacement
> ---
>
> Key: HADOOP-19057
> URL: https://issues.apache.org/jira/browse/HADOOP-19057
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0, 3.2.4, 3.3.9, 3.3.6, 3.5.0
>Reporter: Steve Loughran
>Priority: Critical
>  Labels: pull-request-available
>
> The s3 test bucket used in hadoop-aws tests of S3 select and large file reads 
> is no longer publicly accessible
> {code}
> java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on 
> landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null 
> (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended 
> Request ID: 
> O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null
> {code}
> * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large 
> file for some reading tests
> * changing the default value disables s3 select tests on older releases
> * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it 
> will be skipped
> Proposed
> * we locate a new large file under the (requester pays) s3a://usgs-landsat/ 
> bucket . All releases with HADOOP-18168 can use this
> * update 3.4.1 source to use this; document it
> * do something similar for 3.3.9 + maybe even cut s3 select there too.
> * document how to use it on older releases with requester-pays support
> * document how to completely

[jira] [Commented] (HADOOP-19068) ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon

2024-02-07 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815260#comment-17815260
 ] 

Steve Loughran commented on HADOOP-19068:
-

do a parallel run to see if that makes it "go away" -with more jvms there may 
be less chance of the problem surfacing.

we always run that way just because it is so much faster...this problem may 
have been hidden for that reason

> ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon
> ---
>
> Key: HADOOP-19068
> URL: https://issues.apache.org/jira/browse/HADOOP-19068
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3, test
>Affects Versions: 3.5.0
>Reporter: Antonio Murgia
>Priority: Minor
>
> Running {{mvn clean verify -pl :hadoop-aws}} on my laptop against my s3 
> bucket on eu-west-1 the test \{{ITestS3AClosedFS.testClosedInstrumentation}} 
> fails every time, while when running it in isolation passes correctly.
> It does not fail on branch-3.4, the attached stack trace is tested on commit: 
> 9a7eeadaac818258b319cdb0dc19e9bb1e4fa11a.
> h4. aws endpoint:
> s3.eu-west-1.amazonaws.com
> h4. os:
> [INFO] os.detected.name: osx
> [INFO] os.detected.arch: aarch_64
> [INFO] os.detected.bitness: 64
> [INFO] os.detected.version: 14.3
> [INFO] os.detected.version.major: 14
> [INFO] os.detected.version.minor: 3
> [INFO] os.detected.classifier: osx-aarch_64
> h4. java -version:
> openjdk version "1.8.0_292"
> OpenJDK Runtime Environment (Zulu 8.54.0.21-CA-macos-aarch64) (build 
> 1.8.0_292-b10)
> OpenJDK 64-Bit Server VM (Zulu 8.54.0.21-CA-macos-aarch64) (build 25.292-b10, 
> mixed mode)
> h4. stack trace:
> {{[ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 3.706 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AClosedFS}}
> {{[ERROR] 
> testClosedInstrumentation(org.apache.hadoop.fs.s3a.ITestS3AClosedFS) Time 
> elapsed: 0.447 s <<< FAILURE!}}
> {{org.junit.ComparisonFailure: [S3AInstrumentation.hasMetricSystem()] 
> expected:<[fals]e> but was:<[tru]e>}}
> {{at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)}}
> {{at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)}}
> {{at 
> org.apache.hadoop.fs.s3a.ITestS3AClosedFS.testClosedInstrumentation(ITestS3AClosedFS.java:111)}}
> {{at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
> {{at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
> {{at java.lang.reflect.Method.invoke(Method.java:498)}}
> {{at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)}}
> {{at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)}}
> {{at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)}}
> {{at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)}}
> {{at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)}}
> {{at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)}}
> {{at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)}}
> {{at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{at java.lang.Thread.run(Thread.java:748)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19044) AWS SDK V2 - Update S3A region logic

2024-02-07 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815233#comment-17815233
 ] 

Steve Loughran commented on HADOOP-19044:
-

as well as the fips stuff, we've hit some fun with regions which came out after 
2020: 
https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#VirtualHostingBackwardsCompatibility

bq. For S3 buckets in Regions launched after March 20, 2019, the DNS server 
doesn't route your request directly to the AWS Region where your bucket 
resides. It returns an HTTP 400 Bad Request error instead.

> AWS SDK V2 - Update S3A region logic 
> -
>
> Key: HADOOP-19044
> URL: https://issues.apache.org/jira/browse/HADOOP-19044
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Ahmar Suhail
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set 
> fs.s3a.endpoint to 
> s3.amazonaws.com here:
> [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540]
>  
>  
> HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is 
> set, or if a region can be parsed from fs.s3a.endpoint (which will happen in 
> this case, region will be US_EAST_1), cross region access is not enabled. 
> This will cause 400 errors if the bucket is not in US_EAST_1. 
>  
> Proposed: Updated the logic so that if the endpoint is the global 
> s3.amazonaws.com , cross region access is enabled.  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Reopened] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-02-06 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-19045:
-

this is broken because core-default.xml sets it to 0

> S3A: pass request timeouts down to sdk clients
> --
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19067) Allow tag passing to AWS Assume Role Credential Provider

2024-02-06 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814737#comment-17814737
 ] 

Steve Loughran commented on HADOOP-19067:
-

oh, and you can plug in your own auditor implementation which will get invoked 
with every request to s3

> Allow tag passing to AWS Assume Role Credential Provider
> 
>
> Key: HADOOP-19067
> URL: https://issues.apache.org/jira/browse/HADOOP-19067
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Martin
>Priority: Minor
>
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/AssumedRoleCredentialProvider.java#L131-L133]
>  passes a session name and role arn to AssumeRoleRequest. The AWS AssumeRole 
> API also supports passing a list of tags: 
> [https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/sts/model/AssumeRoleRequest.html#tags()]
> These tags could be used by platforms to enhance the data encoded into 
> CloudTrail entries to provide better information about the client. For 
> example, a 'notebook' based platform could encode the notebook / jobname / 
> invoker-id in these tags, enabling more granular access controls and leaving 
> a richer breadcrumb-trail as to what operations are being performed.
> This is particularly useful in larger environments where jobs do not get 
> individual roles to assume, and there is a desire to track what 
> jobs/notebooks are reading a given set of files in S3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19067) Allow tag passing to AWS Assume Role Credential Provider

2024-02-06 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814735#comment-17814735
 ] 

Steve Loughran commented on HADOOP-19067:
-

audit log will take any key=value pair in the thread level CommonAuditContext; 
even an evaluator function to dynamically work it out. 

you just need to put in a value to the global or thread local maps and let the 
auditor do the rest. no changes in hadoop code *at all*

Assumed role enhancements, that's new and a PR welcome. As with all open 
source: if you need a feature, you are the one who owns the implementation 
work. I and others will do our best to review it though, especially if reminded 
enough.

> Allow tag passing to AWS Assume Role Credential Provider
> 
>
> Key: HADOOP-19067
> URL: https://issues.apache.org/jira/browse/HADOOP-19067
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Martin
>Priority: Minor
>
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/AssumedRoleCredentialProvider.java#L131-L133]
>  passes a session name and role arn to AssumeRoleRequest. The AWS AssumeRole 
> API also supports passing a list of tags: 
> [https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/sts/model/AssumeRoleRequest.html#tags()]
> These tags could be used by platforms to enhance the data encoded into 
> CloudTrail entries to provide better information about the client. For 
> example, a 'notebook' based platform could encode the notebook / jobname / 
> invoker-id in these tags, enabling more granular access controls and leaving 
> a richer breadcrumb-trail as to what operations are being performed.
> This is particularly useful in larger environments where jobs do not get 
> individual roles to assume, and there is a desire to track what 
> jobs/notebooks are reading a given set of files in S3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-17461) Add thread-level IOStatistics Context

2024-02-06 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-17461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814730#comment-17814730
 ] 

Steve Loughran commented on HADOOP-17461:
-

not something we've played with. contributions welcome.

we'd need to get the metrics binding down in the GET/HEAD/PUT/POST requests 
(via RequestFactory), and collect the stats in the input and output streams to 
then
* push into fs stats in close()
* push into thread stats in unbuffer() and close().

* if you look at the audit logging, there's some stats collected across the 
entire job/task and which can be retrieved from s3 server logs
* spark really needs to be collecting IOstats from each task and aggregating 
them. the S3A committers and the abfs/gcs manifest committers can be set to do 
this, but it'd be better if spark took it overitself. IOStatisticsSnapshot is a 
java serializable object, if that helps.

> Add thread-level IOStatistics Context
> -
>
> Key: HADOOP-17461
> URL: https://issues.apache.org/jira/browse/HADOOP-17461
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> For effective reporting of the iostatistics of individual worker threads, we 
> need a thread-level context which IO components update.
> * this contact needs to be passed in two background thread forming work on 
> behalf of a task.
> * IO Components (streams, iterators, filesystems) need to update this context 
> statistics as they perform work
> * Without double counting anything.
> I imagine a ThreadLocal IOStatisticContext which will be updated in the 
> FileSystem API Calls. This context MUST be passed into the background threads 
> used by a task, so that IO is correctly aggregated.
> I don't want streams, listIterators &c to do the updating as there is more 
> risk of double counting. However, we need to see their statistics if we want 
> to know things like "bytes discarded in backwards seeks". And I don't want to 
> be updating a shared context object on every read() call.
> If all we want is store IO (HEAD, GET, DELETE, list performance etc) then the 
> FS is sufficient. 
> If we do want the stream-specific detail, then I propose
> * caching the context in the constructor
> * updating it only in close() or unbuffer() (as we do from S3AInputStream to 
> S3AInstrumenation)
> * excluding those we know the FS already collects.
> h3. important
> when backporting, please follow with HADOOP-18373



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19068) ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon

2024-02-06 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19068:

Component/s: test

> ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon
> ---
>
> Key: HADOOP-19068
> URL: https://issues.apache.org/jira/browse/HADOOP-19068
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3, test
>Affects Versions: 3.5.0
>Reporter: Antonio Murgia
>Assignee: Steve Loughran
>Priority: Minor
>
> Running {{mvn clean -pl :hadoop-aws}} on my laptop against my s3 bucket on 
> eu-west-1 the test {{ITestS3AClosedFS.testClosedInstrumentation }}fails every 
> time, while when running it in isolation passes correctly.
> It does not fail on branch-3.4, the attached stack trace is tested on commit: 
> 9a7eeadaac818258b319cdb0dc19e9bb1e4fa11a.
> h4. aws endpoint:
> s3.eu-west-1.amazonaws.com
> h4. os:
> [INFO] os.detected.name: osx
> [INFO] os.detected.arch: aarch_64
> [INFO] os.detected.bitness: 64
> [INFO] os.detected.version: 14.3
> [INFO] os.detected.version.major: 14
> [INFO] os.detected.version.minor: 3
> [INFO] os.detected.classifier: osx-aarch_64
> h4. java -version:
> openjdk version "1.8.0_292"
> OpenJDK Runtime Environment (Zulu 8.54.0.21-CA-macos-aarch64) (build 
> 1.8.0_292-b10)
> OpenJDK 64-Bit Server VM (Zulu 8.54.0.21-CA-macos-aarch64) (build 25.292-b10, 
> mixed mode)
> h4. stack trace:
> {{[ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 3.706 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AClosedFS}}
> {{[ERROR] 
> testClosedInstrumentation(org.apache.hadoop.fs.s3a.ITestS3AClosedFS) Time 
> elapsed: 0.447 s <<< FAILURE!}}
> {{org.junit.ComparisonFailure: [S3AInstrumentation.hasMetricSystem()] 
> expected:<[fals]e> but was:<[tru]e>}}
> {{at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)}}
> {{at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)}}
> {{at 
> org.apache.hadoop.fs.s3a.ITestS3AClosedFS.testClosedInstrumentation(ITestS3AClosedFS.java:111)}}
> {{at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
> {{at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
> {{at java.lang.reflect.Method.invoke(Method.java:498)}}
> {{at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)}}
> {{at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)}}
> {{at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)}}
> {{at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)}}
> {{at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)}}
> {{at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)}}
> {{at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)}}
> {{at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{at java.lang.Thread.run(Thread.java:748)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-19068) ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon

2024-02-06 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-19068:
---

Assignee: (was: Steve Loughran)

> ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon
> ---
>
> Key: HADOOP-19068
> URL: https://issues.apache.org/jira/browse/HADOOP-19068
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3, test
>Affects Versions: 3.5.0
>Reporter: Antonio Murgia
>Priority: Minor
>
> Running {{mvn clean -pl :hadoop-aws}} on my laptop against my s3 bucket on 
> eu-west-1 the test {{ITestS3AClosedFS.testClosedInstrumentation }}fails every 
> time, while when running it in isolation passes correctly.
> It does not fail on branch-3.4, the attached stack trace is tested on commit: 
> 9a7eeadaac818258b319cdb0dc19e9bb1e4fa11a.
> h4. aws endpoint:
> s3.eu-west-1.amazonaws.com
> h4. os:
> [INFO] os.detected.name: osx
> [INFO] os.detected.arch: aarch_64
> [INFO] os.detected.bitness: 64
> [INFO] os.detected.version: 14.3
> [INFO] os.detected.version.major: 14
> [INFO] os.detected.version.minor: 3
> [INFO] os.detected.classifier: osx-aarch_64
> h4. java -version:
> openjdk version "1.8.0_292"
> OpenJDK Runtime Environment (Zulu 8.54.0.21-CA-macos-aarch64) (build 
> 1.8.0_292-b10)
> OpenJDK 64-Bit Server VM (Zulu 8.54.0.21-CA-macos-aarch64) (build 25.292-b10, 
> mixed mode)
> h4. stack trace:
> {{[ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 3.706 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AClosedFS}}
> {{[ERROR] 
> testClosedInstrumentation(org.apache.hadoop.fs.s3a.ITestS3AClosedFS) Time 
> elapsed: 0.447 s <<< FAILURE!}}
> {{org.junit.ComparisonFailure: [S3AInstrumentation.hasMetricSystem()] 
> expected:<[fals]e> but was:<[tru]e>}}
> {{at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)}}
> {{at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)}}
> {{at 
> org.apache.hadoop.fs.s3a.ITestS3AClosedFS.testClosedInstrumentation(ITestS3AClosedFS.java:111)}}
> {{at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
> {{at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
> {{at java.lang.reflect.Method.invoke(Method.java:498)}}
> {{at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)}}
> {{at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)}}
> {{at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)}}
> {{at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)}}
> {{at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)}}
> {{at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)}}
> {{at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)}}
> {{at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{at java.lang.Thread.run(Thread.java:748)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19068) ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon

2024-02-06 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814726#comment-17814726
 ] 

Steve Loughran commented on HADOOP-19068:
-

thanks. that stack implies that the (shared) metrics instance isn't null, so at 
least one other fs instance is still live.

as it works standalone it's not this test at fault -it'll be something coming 
in from an earlier test in the same process. This is something very dependent 
on deployment and ordering. 

Were you doing a parallel test run, something like {{ -Dparallel-tests 
-DtestsThreadCount=8}} ?

> ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon
> ---
>
> Key: HADOOP-19068
> URL: https://issues.apache.org/jira/browse/HADOOP-19068
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.5.0
>Reporter: Antonio Murgia
>Assignee: Steve Loughran
>Priority: Minor
>
> Running {{mvn clean -pl :hadoop-aws}} on my laptop against my s3 bucket on 
> eu-west-1 the test {{ITestS3AClosedFS.testClosedInstrumentation }}fails every 
> time, while when running it in isolation passes correctly.
> It does not fail on branch-3.4, the attached stack trace is tested on commit: 
> 9a7eeadaac818258b319cdb0dc19e9bb1e4fa11a.
> h4. aws endpoint:
> s3.eu-west-1.amazonaws.com
> h4. os:
> [INFO] os.detected.name: osx
> [INFO] os.detected.arch: aarch_64
> [INFO] os.detected.bitness: 64
> [INFO] os.detected.version: 14.3
> [INFO] os.detected.version.major: 14
> [INFO] os.detected.version.minor: 3
> [INFO] os.detected.classifier: osx-aarch_64
> h4. java -version:
> openjdk version "1.8.0_292"
> OpenJDK Runtime Environment (Zulu 8.54.0.21-CA-macos-aarch64) (build 
> 1.8.0_292-b10)
> OpenJDK 64-Bit Server VM (Zulu 8.54.0.21-CA-macos-aarch64) (build 25.292-b10, 
> mixed mode)
> h4. stack trace:
> {{[ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 3.706 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AClosedFS}}
> {{[ERROR] 
> testClosedInstrumentation(org.apache.hadoop.fs.s3a.ITestS3AClosedFS) Time 
> elapsed: 0.447 s <<< FAILURE!}}
> {{org.junit.ComparisonFailure: [S3AInstrumentation.hasMetricSystem()] 
> expected:<[fals]e> but was:<[tru]e>}}
> {{at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)}}
> {{at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)}}
> {{at 
> org.apache.hadoop.fs.s3a.ITestS3AClosedFS.testClosedInstrumentation(ITestS3AClosedFS.java:111)}}
> {{at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
> {{at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
> {{at java.lang.reflect.Method.invoke(Method.java:498)}}
> {{at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)}}
> {{at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)}}
> {{at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)}}
> {{at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)}}
> {{at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)}}
> {{at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)}}
> {{at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)}}
> {{at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{at java.lang.Thread.run(Thread.java:748)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19068) ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon

2024-02-06 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19068:

Affects Version/s: 3.5.0

> ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon
> ---
>
> Key: HADOOP-19068
> URL: https://issues.apache.org/jira/browse/HADOOP-19068
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Antonio Murgia
>Assignee: Steve Loughran
>Priority: Minor
>
> Running {{mvn clean -pl :hadoop-aws}} on my laptop against my s3 bucket on 
> eu-west-1 the test {{ITestS3AClosedFS.testClosedInstrumentation }}fails every 
> time, while when running it in isolation passes correctly.
> It does not fail on branch-3.4, the attached stack trace is tested on commit: 
> 9a7eeadaac818258b319cdb0dc19e9bb1e4fa11a.
> h4. aws endpoint:
> s3.eu-west-1.amazonaws.com
> h4. os:
> [INFO] os.detected.name: osx
> [INFO] os.detected.arch: aarch_64
> [INFO] os.detected.bitness: 64
> [INFO] os.detected.version: 14.3
> [INFO] os.detected.version.major: 14
> [INFO] os.detected.version.minor: 3
> [INFO] os.detected.classifier: osx-aarch_64
> h4. java -version:
> openjdk version "1.8.0_292"
> OpenJDK Runtime Environment (Zulu 8.54.0.21-CA-macos-aarch64) (build 
> 1.8.0_292-b10)
> OpenJDK 64-Bit Server VM (Zulu 8.54.0.21-CA-macos-aarch64) (build 25.292-b10, 
> mixed mode)
> h4. stack trace:
> {{[ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 3.706 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AClosedFS}}
> {{[ERROR] 
> testClosedInstrumentation(org.apache.hadoop.fs.s3a.ITestS3AClosedFS) Time 
> elapsed: 0.447 s <<< FAILURE!}}
> {{org.junit.ComparisonFailure: [S3AInstrumentation.hasMetricSystem()] 
> expected:<[fals]e> but was:<[tru]e>}}
> {{at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)}}
> {{at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)}}
> {{at 
> org.apache.hadoop.fs.s3a.ITestS3AClosedFS.testClosedInstrumentation(ITestS3AClosedFS.java:111)}}
> {{at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
> {{at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
> {{at java.lang.reflect.Method.invoke(Method.java:498)}}
> {{at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)}}
> {{at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)}}
> {{at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)}}
> {{at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)}}
> {{at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)}}
> {{at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)}}
> {{at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)}}
> {{at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{at java.lang.Thread.run(Thread.java:748)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19068) ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon

2024-02-06 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19068:

Component/s: fs/s3

> ITestS3AClosedFS.testClosedInstrumentation fails on trunk on Mac OS Silicon
> ---
>
> Key: HADOOP-19068
> URL: https://issues.apache.org/jira/browse/HADOOP-19068
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.5.0
>Reporter: Antonio Murgia
>Assignee: Steve Loughran
>Priority: Minor
>
> Running {{mvn clean -pl :hadoop-aws}} on my laptop against my s3 bucket on 
> eu-west-1 the test {{ITestS3AClosedFS.testClosedInstrumentation }}fails every 
> time, while when running it in isolation passes correctly.
> It does not fail on branch-3.4, the attached stack trace is tested on commit: 
> 9a7eeadaac818258b319cdb0dc19e9bb1e4fa11a.
> h4. aws endpoint:
> s3.eu-west-1.amazonaws.com
> h4. os:
> [INFO] os.detected.name: osx
> [INFO] os.detected.arch: aarch_64
> [INFO] os.detected.bitness: 64
> [INFO] os.detected.version: 14.3
> [INFO] os.detected.version.major: 14
> [INFO] os.detected.version.minor: 3
> [INFO] os.detected.classifier: osx-aarch_64
> h4. java -version:
> openjdk version "1.8.0_292"
> OpenJDK Runtime Environment (Zulu 8.54.0.21-CA-macos-aarch64) (build 
> 1.8.0_292-b10)
> OpenJDK 64-Bit Server VM (Zulu 8.54.0.21-CA-macos-aarch64) (build 25.292-b10, 
> mixed mode)
> h4. stack trace:
> {{[ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 3.706 s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AClosedFS}}
> {{[ERROR] 
> testClosedInstrumentation(org.apache.hadoop.fs.s3a.ITestS3AClosedFS) Time 
> elapsed: 0.447 s <<< FAILURE!}}
> {{org.junit.ComparisonFailure: [S3AInstrumentation.hasMetricSystem()] 
> expected:<[fals]e> but was:<[tru]e>}}
> {{at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)}}
> {{at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)}}
> {{at 
> org.apache.hadoop.fs.s3a.ITestS3AClosedFS.testClosedInstrumentation(ITestS3AClosedFS.java:111)}}
> {{at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
> {{at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}}
> {{at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}}
> {{at java.lang.reflect.Method.invoke(Method.java:498)}}
> {{at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)}}
> {{at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)}}
> {{at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)}}
> {{at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)}}
> {{at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)}}
> {{at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)}}
> {{at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)}}
> {{at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)}}
> {{at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{at java.lang.Thread.run(Thread.java:748)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19067) Allow tag passing to AWS Assume Role Credential Provider

2024-02-05 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19067:

Affects Version/s: 3.4.0
   (was: 3.3.6)

> Allow tag passing to AWS Assume Role Credential Provider
> 
>
> Key: HADOOP-19067
> URL: https://issues.apache.org/jira/browse/HADOOP-19067
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Martin
>Priority: Minor
>
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/AssumedRoleCredentialProvider.java#L131-L133]
>  passes a session name and role arn to AssumeRoleRequest. The AWS AssumeRole 
> API also supports passing a list of tags: 
> [https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/sts/model/AssumeRoleRequest.html#tags()]
> These tags could be used by platforms to enhance the data encoded into 
> CloudTrail entries to provide better information about the client. For 
> example, a 'notebook' based platform could encode the notebook / jobname / 
> invoker-id in these tags, enabling more granular access controls and leaving 
> a richer breadcrumb-trail as to what operations are being performed.
> This is particularly useful in larger environments where jobs do not get 
> individual roles to assume, and there is a desire to track what 
> jobs/notebooks are reading a given set of files in S3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19067) Allow tag passing to AWS Credential Provider

2024-02-05 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814511#comment-17814511
 ] 

Steve Loughran commented on HADOOP-19067:
-

you've seen the s3 auditing stuff right? where can map HTTP requests to 
kerberos principals, spark jobs IDs, even fs commands?

main issue there is the http referrer header doesn't get to cloudtrail -if you 
could express your need for that to anyone @ AWS you know that'd be great. I 
want to tie every single GET operation to the job and task which does it. 
mapping assume role to (principal, job, id) helps, but if you have multiple 
jobs with same role active at the same time, insufficient.

as for the adding of tags
* an option to add that referrer header would be good
* and if you look at the fs.s3a.header design something similar to that for 
assumed role tags will be welcome too.

usual test process as documented in testing.md. thanks. Hadoop 3.4+ only BTW; 
3.3.x is feature frozen for s3a, just critical bug fixes -the move to the v2 
sdk makes backporting too hard.


> Allow tag passing to AWS Credential Provider
> 
>
> Key: HADOOP-19067
> URL: https://issues.apache.org/jira/browse/HADOOP-19067
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Jason Martin
>Priority: Minor
>
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/AssumedRoleCredentialProvider.java#L131-L133]
>  passes a session name and role arn to AssumeRoleRequest. The AWS AssumeRole 
> API also supports passing a list of tags: 
> [https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/sts/model/AssumeRoleRequest.html#tags()]
> These tags could be used by platforms to enhance the data encoded into 
> CloudTrail entries to provide better information about the client. For 
> example, a 'notebook' based platform could encode the notebook / jobname / 
> invoker-id in these tags, enabling more granular access controls and leaving 
> a richer breadcrumb-trail as to what operations are being performed.
> This is particularly useful in larger environments where jobs do not get 
> individual roles to assume, and there is a desire to track what 
> jobs/notebooks are reading a given set of files in S3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19067) Allow tag passing to AWS Assume Role Credential Provider

2024-02-05 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19067:

Summary: Allow tag passing to AWS Assume Role Credential Provider  (was: 
Allow tag passing to AWS Credential Provider)

> Allow tag passing to AWS Assume Role Credential Provider
> 
>
> Key: HADOOP-19067
> URL: https://issues.apache.org/jira/browse/HADOOP-19067
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Jason Martin
>Priority: Minor
>
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/AssumedRoleCredentialProvider.java#L131-L133]
>  passes a session name and role arn to AssumeRoleRequest. The AWS AssumeRole 
> API also supports passing a list of tags: 
> [https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/sts/model/AssumeRoleRequest.html#tags()]
> These tags could be used by platforms to enhance the data encoded into 
> CloudTrail entries to provide better information about the client. For 
> example, a 'notebook' based platform could encode the notebook / jobname / 
> invoker-id in these tags, enabling more granular access controls and leaving 
> a richer breadcrumb-trail as to what operations are being performed.
> This is particularly useful in larger environments where jobs do not get 
> individual roles to assume, and there is a desire to track what 
> jobs/notebooks are reading a given set of files in S3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18993) Allow to not isolate S3AFileSystem classloader when needed

2024-02-05 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814461#comment-17814461
 ] 

Steve Loughran commented on HADOOP-18993:
-

[~tmnd91] : merged to trunk. Can you create a PR against branch-3.4 and rerun 
the tests, and then we can merge there and so target 3.4.1 as the release with 
this.

thanks!

> Allow to not isolate S3AFileSystem classloader when needed
> --
>
> Key: HADOOP-18993
> URL: https://issues.apache.org/jira/browse/HADOOP-18993
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hadoop-thirdparty
>Affects Versions: 3.3.6
>Reporter: Antonio Murgia
>Assignee: Antonio Murgia
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be 
> the same as the one that loaded S3AFileSystem. This leads to the 
> impossibility in Spark applications to load third party credentials providers 
> as user jars.
> I propose to add a configuration key 
> {{fs.s3a.extensions.isolated.classloader}} with a default value of {{true}} 
> that if set to {{false}} will not perform the classloader set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-18993) Allow to not isolate S3AFileSystem classloader when needed

2024-02-05 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18993.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Allow to not isolate S3AFileSystem classloader when needed
> --
>
> Key: HADOOP-18993
> URL: https://issues.apache.org/jira/browse/HADOOP-18993
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hadoop-thirdparty
>Affects Versions: 3.3.6
>Reporter: Antonio Murgia
>Assignee: Antonio Murgia
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be 
> the same as the one that loaded S3AFileSystem. This leads to the 
> impossibility in Spark applications to load third party credentials providers 
> as user jars.
> I propose to add a configuration key 
> {{fs.s3a.extensions.isolated.classloader}} with a default value of {{true}} 
> that if set to {{false}} will not perform the classloader set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-18993) Allow to not isolate S3AFileSystem classloader when needed

2024-02-05 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-18993:
---

Assignee: Antonio Murgia

> Allow to not isolate S3AFileSystem classloader when needed
> --
>
> Key: HADOOP-18993
> URL: https://issues.apache.org/jira/browse/HADOOP-18993
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hadoop-thirdparty
>Affects Versions: 3.3.6
>Reporter: Antonio Murgia
>Assignee: Antonio Murgia
>Priority: Minor
>  Labels: pull-request-available
>
> In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be 
> the same as the one that loaded S3AFileSystem. This leads to the 
> impossibility in Spark applications to load third party credentials providers 
> as user jars.
> I propose to add a configuration key 
> {{fs.s3a.extensions.isolated.classloader}} with a default value of {{true}} 
> that if set to {{false}} will not perform the classloader set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19066) AWS SDK V2 - Enabling FIPS should be allowed with central endpoint

2024-02-05 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814372#comment-17814372
 ] 

Steve Loughran commented on HADOOP-19066:
-

ha! what a moving target region support is. fs.s3a.endpoint was so much simpler

> AWS SDK V2 - Enabling FIPS should be allowed with central endpoint
> --
>
> Key: HADOOP-19066
> URL: https://issues.apache.org/jira/browse/HADOOP-19066
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>
> FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK 
> considers overriding endpoint and enabling fips as mutually exclusive, we 
> fail fast if fs.s3a.endpoint is set with fips support (details on 
> HADOOP-18975).
> Now, we no longer override SDK endpoint for central endpoint since we enable 
> cross region access (details on HADOOP-19044) but we would still fail fast if 
> endpoint is central and fips is enabled.
> Changes proposed:
>  * S3A to fail fast only if FIPS is enabled and non-central endpoint is 
> configured.
>  * Tests to ensure S3 bucket is accessible with default region us-east-2 with 
> cross region access (expected with central endpoint).
>  * Document FIPS support with central endpoint on connecting.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-19049) Class loader leak caused by StatisticsDataReferenceCleaner thread

2024-02-03 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-19049:
---

Assignee: Jia Fan

> Class loader leak caused by StatisticsDataReferenceCleaner thread
> -
>
> Key: HADOOP-19049
> URL: https://issues.apache.org/jira/browse/HADOOP-19049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.6
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The 
> "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" 
> daemon thread was created by FileSystem. 
> This is fine if the thread's context class loader is the system class loader, 
> but it's bad if the context class loader is a custom class loader. The 
> reference held by this daemon thread means that the class loader can never 
> become eligible for GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19049) Class loader leak caused by StatisticsDataReferenceCleaner thread

2024-02-03 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19049.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Class loader leak caused by StatisticsDataReferenceCleaner thread
> -
>
> Key: HADOOP-19049
> URL: https://issues.apache.org/jira/browse/HADOOP-19049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.6
>Reporter: Jia Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The 
> "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" 
> daemon thread was created by FileSystem. 
> This is fine if the thread's context class loader is the system class loader, 
> but it's bad if the context class loader is a custom class loader. The 
> reference held by this daemon thread means that the class loader can never 
> become eligible for GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19044) AWS SDK V2 - Update S3A region logic

2024-02-03 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19044:

Fix Version/s: 3.4.1

> AWS SDK V2 - Update S3A region logic 
> -
>
> Key: HADOOP-19044
> URL: https://issues.apache.org/jira/browse/HADOOP-19044
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Ahmar Suhail
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set 
> fs.s3a.endpoint to 
> s3.amazonaws.com here:
> [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540]
>  
>  
> HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is 
> set, or if a region can be parsed from fs.s3a.endpoint (which will happen in 
> this case, region will be US_EAST_1), cross region access is not enabled. 
> This will cause 400 errors if the bucket is not in US_EAST_1. 
>  
> Proposed: Updated the logic so that if the endpoint is the global 
> s3.amazonaws.com , cross region access is enabled.  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19044) AWS SDK V2 - Update S3A region logic

2024-02-02 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19044.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> AWS SDK V2 - Update S3A region logic 
> -
>
> Key: HADOOP-19044
> URL: https://issues.apache.org/jira/browse/HADOOP-19044
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Ahmar Suhail
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set 
> fs.s3a.endpoint to 
> s3.amazonaws.com here:
> [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540]
>  
>  
> HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is 
> set, or if a region can be parsed from fs.s3a.endpoint (which will happen in 
> this case, region will be US_EAST_1), cross region access is not enabled. 
> This will cause 400 errors if the bucket is not in US_EAST_1. 
>  
> Proposed: Updated the logic so that if the endpoint is the global 
> s3.amazonaws.com , cross region access is enabled.  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18980) S3A credential provider remapping: make extensible

2024-02-02 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18980:

Fix Version/s: 3.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

fixed in 3.5; backport to 3.4.x recommended

> S3A credential provider remapping: make extensible
> --
>
> Key: HADOOP-18980
> URL: https://issues.apache.org/jira/browse/HADOOP-18980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> s3afs will now remap the common com.amazonaws credential providers to 
> equivalents in the v2 sdk or in hadoop-aws
> We could do the same for third party credential providers by taking a 
> key=value list in a configuration property and adding to the map. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-18987) Corrections to Hadoop FileSystem API Definition

2024-02-02 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18987.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Corrections to Hadoop FileSystem API Definition
> ---
>
> Key: HADOOP-18987
> URL: https://issues.apache.org/jira/browse/HADOOP-18987
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.6
>Reporter: Dieter De Paepe
>Assignee: Dieter De Paepe
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> I noticed a lot of inconsistencies, typos and informal statements in the 
> "formal" FileSystem API definition 
> ([https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/index.html)]
> Creating this ticket to link my PR against.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-19059) update AWS SDK to support S3 Access Grants in S3A

2024-01-31 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-19059:
---

Assignee: Jason Han

> update AWS SDK to support S3 Access Grants in S3A
> -
>
> Key: HADOOP-19059
> URL: https://issues.apache.org/jira/browse/HADOOP-19059
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Assignee: Jason Han
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In order to support S3 Access 
> Grants(https://aws.amazon.com/s3/features/access-grants/) in S3A, we need to 
> update AWS SDK in hadooop package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19059) update AWS SDK to support S3 Access Grants in S3A

2024-01-31 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812765#comment-17812765
 ] 

Steve Loughran commented on HADOOP-19059:
-

moved to hadoop module, jira is now HADOOP-19059..please use this in pr and 
commits. all cloud storage work should normally go in this jira project

> update AWS SDK to support S3 Access Grants in S3A
> -
>
> Key: HADOOP-19059
> URL: https://issues.apache.org/jira/browse/HADOOP-19059
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In order to support S3 Access 
> Grants(https://aws.amazon.com/s3/features/access-grants/) in S3A, we need to 
> update AWS SDK in hadooop package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19059) update AWS SDK to support S3 Access Grants in S3A

2024-01-31 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19059:

Component/s: build

> update AWS SDK to support S3 Access Grants in S3A
> -
>
> Key: HADOOP-19059
> URL: https://issues.apache.org/jira/browse/HADOOP-19059
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In order to support S3 Access 
> Grants(https://aws.amazon.com/s3/features/access-grants/) in S3A, we need to 
> update AWS SDK in hadooop package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Moved] (HADOOP-19059) update AWS SDK to support S3 Access Grants in S3A

2024-01-31 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HDFS-17350 to HADOOP-19059:


  Component/s: fs/s3
   (was: fs/s3)
Fix Version/s: (was: 3.3.6)
  Key: HADOOP-19059  (was: HDFS-17350)
 Target Version/s:   (was: 3.3.6)
Affects Version/s: 3.4.0
   (was: 3.3.6)
  Project: Hadoop Common  (was: Hadoop HDFS)

> update AWS SDK to support S3 Access Grants in S3A
> -
>
> Key: HADOOP-19059
> URL: https://issues.apache.org/jira/browse/HADOOP-19059
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In order to support S3 Access 
> Grants(https://aws.amazon.com/s3/features/access-grants/) in S3A, we need to 
> update AWS SDK in hadooop package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-19050) Add S3 Access Grants Support in S3A

2024-01-31 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-19050:
---

Assignee: Jason Han

> Add S3 Access Grants Support in S3A
> ---
>
> Key: HADOOP-19050
> URL: https://issues.apache.org/jira/browse/HADOOP-19050
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Assignee: Jason Han
>Priority: Minor
>  Labels: pull-request-available
>
> Add support for S3 Access Grants 
> (https://aws.amazon.com/s3/features/access-grants/) in S3A.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19057) S3 public test bucket landsat-pds unreadable -needs replacement

2024-01-30 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812452#comment-17812452
 ] 

Steve Loughran commented on HADOOP-19057:
-

HADOOP-14661 added requester pays, so 3.3.5+ can move to a new source

> S3 public test bucket landsat-pds unreadable -needs replacement
> ---
>
> Key: HADOOP-19057
> URL: https://issues.apache.org/jira/browse/HADOOP-19057
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0, 3.2.4, 3.3.9, 3.3.6, 3.5.0
>Reporter: Steve Loughran
>Priority: Critical
>
> The s3 test bucket used in hadoop-aws tests of S3 select and large file reads 
> is no longer publicly accessible
> {code}
> java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on 
> landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null 
> (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended 
> Request ID: 
> O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null
> {code}
> * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large 
> file for some reading tests
> * changing the default value disables s3 select tests on older releases
> * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it 
> will be skipped
> Proposed
> * we locate a new large file under the (requester pays) s3a://usgs-landsat/ 
> bucket . All releases with HADOOP-18168 can use this
> * update 3.4.1 source to use this; document it
> * do something similar for 3.3.9 + maybe even cut s3 select there too.
> * document how to use it on older releases with requester-pays support
> * document how to completely disable it on older releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-17784) hadoop-aws landsat-pds test bucket will be deleted after Jul 1, 2021

2024-01-30 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-17784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17784.
-
Resolution: Duplicate

HADOOP-17784 will address this now the bucket is completely gone

> hadoop-aws landsat-pds test bucket will be deleted after Jul 1, 2021
> 
>
> Key: HADOOP-17784
> URL: https://issues.apache.org/jira/browse/HADOOP-17784
> Project: Hadoop Common
>  Issue Type: Test
>  Components: fs/s3, test
>Reporter: Leona Yoda
>Priority: Major
> Attachments: org.apache.hadoop.fs.s3a.select.ITestS3SelectMRJob.txt
>
>
> I found an anouncement that landsat-pds buket will be deleted on July 1, 2021
> (https://registry.opendata.aws/landsat-8/)
> and  I think this bucket  is used in th test of hadoop-aws module use
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java#L93]
>  
> At this time I can access the bucket but we might have to change the test 
> bucket in someday.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19057) S3 public test bucket landsat-pds unreadable -needs replacement

2024-01-30 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19057:
---

 Summary: S3 public test bucket landsat-pds unreadable -needs 
replacement
 Key: HADOOP-19057
 URL: https://issues.apache.org/jira/browse/HADOOP-19057
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.3.6, 3.2.4, 3.4.0, 3.3.9, 3.5.0
Reporter: Steve Loughran


The s3 test bucket used in hadoop-aws tests of S3 select and large file reads 
is no longer publicly accessible

{code}
java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on 
landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null 
(Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended Request 
ID: 
O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null

{code}

* Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large file 
for some reading tests
* changing the default value disables s3 select tests on older releases
* if fs.s3a.scale.test.csvfile is set to " " then other tests which need it 
will be skipped

Proposed
* we locate a new large file under the (requester pays) s3a://usgs-landsat/ 
bucket . All releases with HADOOP-18168 can use this
* update 3.4.1 source to use this; document it
* do something similar for 3.3.9 + maybe even cut s3 select there too.
* document how to use it on older releases with requester-pays support
* document how to completely disable it on older releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19022) S3A : ITestS3AConfiguration#testRequestTimeout failure

2024-01-30 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19022.
-
Fix Version/s: 3.5.0
   3.4.1
 Assignee: Steve Loughran
   Resolution: Duplicate

> S3A : ITestS3AConfiguration#testRequestTimeout failure
> --
>
> Key: HADOOP-19022
> URL: https://issues.apache.org/jira/browse/HADOOP-19022
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.5.0, 3.4.1
>
>
> "fs.s3a.connection.request.timeout" should be specified in milliseconds as per
> {code:java}
> Duration apiCallTimeout = getDuration(conf, REQUEST_TIMEOUT,
> DEFAULT_REQUEST_TIMEOUT_DURATION, TimeUnit.MILLISECONDS, Duration.ZERO); 
> {code}
> The test fails consistently because it sets 120 ms timeout which is less than 
> 15s (min network operation duration), and hence gets reset to 15000 ms based 
> on the enforcement.
>  
> {code:java}
> [ERROR] testRequestTimeout(org.apache.hadoop.fs.s3a.ITestS3AConfiguration)  
> Time elapsed: 0.016 s  <<< FAILURE!
> java.lang.AssertionError: Configured fs.s3a.connection.request.timeout is 
> different than what AWS sdk configuration uses internally expected:<12> 
> but was:<15000>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.fs.s3a.ITestS3AConfiguration.testRequestTimeout(ITestS3AConfiguration.java:444)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-01-30 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19045.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> S3A: pass request timeouts down to sdk clients
> --
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18830) S3A: Cut S3 Select

2024-01-30 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18830:

Fix Version/s: 3.5.0

> S3A: Cut S3 Select
> --
>
> Key: HADOOP-18830
> URL: https://issues.apache.org/jira/browse/HADOOP-18830
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> getting s3 select to work with the v2 sdk is tricky, we need to add extra 
> libraries to the classpath beyond just bundle.jar. we can do this but
> * AFAIK nobody has ever done CSV predicate pushdown, as it breaks split logic 
> completely
> * CSV is a bad format
> * one-line JSON more structured but also way less efficient
> ORC/Parquet benefit from vectored IO and work spanning the cluster.
> accordingly, I'm wondering what to do about s3 select
> # cut?
> # downgrade to optional and document the extra classes on the classpath
> Option #2 is straightforward and effectively the default. we can also declare 
> the feature deprecated.
> {code}
> [ERROR] 
> testReadLandsatRecordsNoMatch(org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat)
>   Time elapsed: 147.958 s  <<< ERROR!
> java.io.IOException: java.lang.NoClassDefFoundError: 
> software/amazon/eventstream/MessageDecoder
> at 
> org.apache.hadoop.fs.s3a.select.SelectObjectContentHelper.select(SelectObjectContentHelper.java:75)
> at 
> org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$select$10(WriteOperationHelper.java:660)
> at 
> org.apache.hadoop.fs.store.audit.AuditingFunctions.lambda$withinAuditSpan$0(AuditingFunctions.java:62)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-18830) S3A: Cut S3 Select

2024-01-30 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18830.
-
Fix Version/s: 3.4.1
 Hadoop Flags: Incompatible change
 Release Note: S3 Select is no longer supported through the S3A connector
   Resolution: Fixed

> S3A: Cut S3 Select
> --
>
> Key: HADOOP-18830
> URL: https://issues.apache.org/jira/browse/HADOOP-18830
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> getting s3 select to work with the v2 sdk is tricky, we need to add extra 
> libraries to the classpath beyond just bundle.jar. we can do this but
> * AFAIK nobody has ever done CSV predicate pushdown, as it breaks split logic 
> completely
> * CSV is a bad format
> * one-line JSON more structured but also way less efficient
> ORC/Parquet benefit from vectored IO and work spanning the cluster.
> accordingly, I'm wondering what to do about s3 select
> # cut?
> # downgrade to optional and document the extra classes on the classpath
> Option #2 is straightforward and effectively the default. we can also declare 
> the feature deprecated.
> {code}
> [ERROR] 
> testReadLandsatRecordsNoMatch(org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat)
>   Time elapsed: 147.958 s  <<< ERROR!
> java.io.IOException: java.lang.NoClassDefFoundError: 
> software/amazon/eventstream/MessageDecoder
> at 
> org.apache.hadoop.fs.s3a.select.SelectObjectContentHelper.select(SelectObjectContentHelper.java:75)
> at 
> org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$select$10(WriteOperationHelper.java:660)
> at 
> org.apache.hadoop.fs.store.audit.AuditingFunctions.lambda$withinAuditSpan$0(AuditingFunctions.java:62)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2024-01-30 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812309#comment-17812309
 ] 

Steve Loughran commented on HADOOP-18883:
-

ok. think I've merged it everywhere and updated fix versions to match

> Expect-100 JDK bug resolution: prevent multiple server calls
> 
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>  
> With the current implementation of HttpURLConnection if server rejects the 
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance 
> (ex getHeaderField(), or getHeaderFields()). They will internally call 
> getOuputStream() which invokes writeRequests(), which make the actual server 
> call. 
> In the AbfsHttpOperation, after sendRequest() we call processResponse() 
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due 
> to expect-100 error, we consume the exception and let the code go ahead. So, 
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which 
> will be triggered after getOutputStream is failed. These invocation will lead 
> to server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2024-01-30 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18883:

Fix Version/s: 3.4.1
   (was: 3.4.0)

> Expect-100 JDK bug resolution: prevent multiple server calls
> 
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>  
> With the current implementation of HttpURLConnection if server rejects the 
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance 
> (ex getHeaderField(), or getHeaderFields()). They will internally call 
> getOuputStream() which invokes writeRequests(), which make the actual server 
> call. 
> In the AbfsHttpOperation, after sendRequest() we call processResponse() 
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due 
> to expect-100 error, we consume the exception and let the code go ahead. So, 
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which 
> will be triggered after getOutputStream is failed. These invocation will lead 
> to server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2024-01-30 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18883:

Fix Version/s: 3.3.9

> Expect-100 JDK bug resolution: prevent multiple server calls
> 
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9, 3.5.0
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>  
> With the current implementation of HttpURLConnection if server rejects the 
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance 
> (ex getHeaderField(), or getHeaderFields()). They will internally call 
> getOuputStream() which invokes writeRequests(), which make the actual server 
> call. 
> In the AbfsHttpOperation, after sendRequest() we call processResponse() 
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due 
> to expect-100 error, we consume the exception and let the code go ahead. So, 
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which 
> will be triggered after getOutputStream is failed. These invocation will lead 
> to server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2024-01-30 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18883:

Fix Version/s: 3.4.0

> Expect-100 JDK bug resolution: prevent multiple server calls
> 
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.5.0
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>  
> With the current implementation of HttpURLConnection if server rejects the 
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance 
> (ex getHeaderField(), or getHeaderFields()). They will internally call 
> getOuputStream() which invokes writeRequests(), which make the actual server 
> call. 
> In the AbfsHttpOperation, after sendRequest() we call processResponse() 
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due 
> to expect-100 error, we consume the exception and let the code go ahead. So, 
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which 
> will be triggered after getOutputStream is failed. These invocation will lead 
> to server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18925) S3A: add option "fs.s3a.optimized.copy.from.local.enabled" to enable/disable CopyFromLocalOperation

2024-01-29 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18925:

Summary: S3A: add option "fs.s3a.optimized.copy.from.local.enabled" to 
enable/disable CopyFromLocalOperation  (was: S3A: add option 
"fs.s3a.copy.from.local.enabled" to enable/disable CopyFromLocalOperation)

> S3A: add option "fs.s3a.optimized.copy.from.local.enabled" to enable/disable 
> CopyFromLocalOperation
> ---
>
> Key: HADOOP-18925
> URL: https://issues.apache.org/jira/browse/HADOOP-18925
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> reported failure of CopyFromLocalOperation.getFinalPath() during job 
> submission with s3a declared as cluster fs.
> add an emergency option to disable this optimised uploader and revert to the 
> superclass implementation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19022) S3A : ITestS3AConfiguration#testRequestTimeout failure

2024-01-29 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811963#comment-17811963
 ] 

Steve Loughran commented on HADOOP-19022:
-

 going to fix this, but also wondering "why didn't I see this?"

answer: the skip if cross region logic is on default settings, not the bucket 
specific values. with no default endpoint/region, my test runs skipped it 
entirely. need to fix that too

> S3A : ITestS3AConfiguration#testRequestTimeout failure
> --
>
> Key: HADOOP-19022
> URL: https://issues.apache.org/jira/browse/HADOOP-19022
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Priority: Minor
>
> "fs.s3a.connection.request.timeout" should be specified in milliseconds as per
> {code:java}
> Duration apiCallTimeout = getDuration(conf, REQUEST_TIMEOUT,
> DEFAULT_REQUEST_TIMEOUT_DURATION, TimeUnit.MILLISECONDS, Duration.ZERO); 
> {code}
> The test fails consistently because it sets 120 ms timeout which is less than 
> 15s (min network operation duration), and hence gets reset to 15000 ms based 
> on the enforcement.
>  
> {code:java}
> [ERROR] testRequestTimeout(org.apache.hadoop.fs.s3a.ITestS3AConfiguration)  
> Time elapsed: 0.016 s  <<< FAILURE!
> java.lang.AssertionError: Configured fs.s3a.connection.request.timeout is 
> different than what AWS sdk configuration uses internally expected:<12> 
> but was:<15000>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.fs.s3a.ITestS3AConfiguration.testRequestTimeout(ITestS3AConfiguration.java:444)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19046) S3A: update AWS sdk versions to 2.23.5 and 1.12.599

2024-01-29 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19046:

Fix Version/s: 3.4.1

> S3A: update AWS sdk versions to 2.23.5 and 1.12.599
> ---
>
> Key: HADOOP-19046
> URL: https://issues.apache.org/jira/browse/HADOOP-19046
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Move up to the most recent versions of the v2 sdk, with a v1 update just to 
> keep some CVE checking happy.
> {code}
> 1.12.599
> 2.23.5
> {code}
> The v1 SDK is only used for testing...it is not bundled or declared as a 
> dependency of the hadoop-aws module



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19046) S3A: update AWS sdk versions to 2.23.5 and 1.12.599

2024-01-26 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811246#comment-17811246
 ] 

Steve Loughran commented on HADOOP-19046:
-

cherrypicking to 3.4; testing

> S3A: update AWS sdk versions to 2.23.5 and 1.12.599
> ---
>
> Key: HADOOP-19046
> URL: https://issues.apache.org/jira/browse/HADOOP-19046
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Move up to the most recent versions of the v2 sdk, with a v1 update just to 
> keep some CVE checking happy.
> {code}
> 1.12.599
> 2.23.5
> {code}
> The v1 SDK is only used for testing...it is not bundled or declared as a 
> dependency of the hadoop-aws module



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19046) S3A: update AWS sdk versions to 2.23.5 and 1.12.599

2024-01-26 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19046:

Description: 
Move up to the most recent versions of the v2 sdk, with a v1 update just to 
keep some CVE checking happy.


{code}
1.12.599
2.23.5

{code}

The v1 SDK is only used for testing...it is not bundled or declared as a 
dependency of the hadoop-aws module


  was:
Move up to the most recent versions of the v2 sdk, with a v1 update just to 
keep some CVE checking happy.


{code}
1.12.599
2.23.5

{code}



> S3A: update AWS sdk versions to 2.23.5 and 1.12.599
> ---
>
> Key: HADOOP-19046
> URL: https://issues.apache.org/jira/browse/HADOOP-19046
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Move up to the most recent versions of the v2 sdk, with a v1 update just to 
> keep some CVE checking happy.
> {code}
> 1.12.599
> 2.23.5
> {code}
> The v1 SDK is only used for testing...it is not bundled or declared as a 
> dependency of the hadoop-aws module



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19046) S3A: update AWS sdk versions to 2.23.5 and 1.12.599

2024-01-26 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19046:

Summary: S3A: update AWS sdk versions to 2.23.5 and 1.12.599  (was: S3A: 
update sdk versions)

> S3A: update AWS sdk versions to 2.23.5 and 1.12.599
> ---
>
> Key: HADOOP-19046
> URL: https://issues.apache.org/jira/browse/HADOOP-19046
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Move up to the most recent versions of the v2 sdk, with a v1 update just to 
> keep some CVE checking happy.
> {code}
> 1.12.599
> 2.23.5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18706) Improve S3ABlockOutputStream recovery

2024-01-25 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810808#comment-17810808
 ] 

Steve Loughran commented on HADOOP-18706:
-

catching up on this; i'd forgotten we'd had to revert it. will look at ASAP

> Improve S3ABlockOutputStream recovery
> -
>
> Key: HADOOP-18706
> URL: https://issues.apache.org/jira/browse/HADOOP-18706
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Chris Bevard
>Assignee: Chris Bevard
>Priority: Minor
>  Labels: pull-request-available
>
> If an application crashes during an S3ABlockOutputStream upload, it's 
> possible to complete the upload if fast.upload.buffer is set to disk by 
> uploading the s3ablock file with putObject as the final part of the multipart 
> upload. If the application has multiple uploads running in parallel though 
> and they're on the same part number when the application fails, then there is 
> no way to determine which file belongs to which object, and recovery of 
> either upload is impossible.
> If the temporary file name for disk buffering included the s3 key, then every 
> partial upload would be recoverable.
> h3. Important disclaimer
> This change does not directly add the Syncable semantics which applications 
> that require {{Syncable.hsync()}} to only return after all pending data has 
> been durably written to the destination path. S3 is not a filesystem and this 
> change does not make it so.
> What is does do is assist anyone trying to implement some post-crash recovery 
> process which
> # interrogates s3 to identofy pending uploads to a specific path and get a 
> list of uploaded blocks yet to be committed
> # scans the local fs.s3a.buffer dir directories to identify in-progress-write 
> blocks for the same target destination. That is those which were being 
> uploaded, queued for uploaded and the single "new data being written to" 
> block for an output stream
> # uploads all those pending blocks
> # generates a new POST to complete a multipart upload with all the blocks in 
> the correct order
> All this patch does is ensure the buffered block filenames include the final 
> path and block ID, to aid in identify which blocks need to be uploaded and 
> what order. 
> h2. warning
> causes HADOOP-18744 -always include the relevant fix when backporting



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19050) Add S3 Access Grants Support in S3A

2024-01-24 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810496#comment-17810496
 ] 

Steve Loughran commented on HADOOP-19050:
-

this is going to keep you busy. Can you
* talk to [~ahmar] about the best way to get this in -ideally with some commits 
of review time there.
* do a quick design doc and attach to this, including how it can be tested. as 
this just adds another auth type to the test matrix.
* try with a simple patch to the hadoop code first, just to learn a bit more 
about the process of getting stuff in

Assuming this is a series of patches, we might want to create a feature branch 
for it. 
Good: rebasing allowed, no risk of causing problems partway through.
Bad: you have to keep this in sync, end of work merge can be trickier.

it really comes down to how well you can isolate work and how traumatic that 
merge will be.

meanwhile, I'm upgrading this to a major feature and assigning to you. Can you 
create other jiras underneath?

> Add S3 Access Grants Support in S3A
> ---
>
> Key: HADOOP-19050
> URL: https://issues.apache.org/jira/browse/HADOOP-19050
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Priority: Minor
>
> Add support for S3 Access Grants 
> (https://aws.amazon.com/s3/features/access-grants/) in S3A.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19050) Add S3 Access Grants Support in S3A

2024-01-24 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19050:

Affects Version/s: 3.4.0

> Add S3 Access Grants Support in S3A
> ---
>
> Key: HADOOP-19050
> URL: https://issues.apache.org/jira/browse/HADOOP-19050
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Priority: Minor
>
> Add support for S3 Access Grants 
> (https://aws.amazon.com/s3/features/access-grants/) in S3A.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19046) S3A: update sdk versions

2024-01-24 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19046.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> S3A: update sdk versions
> 
>
> Key: HADOOP-19046
> URL: https://issues.apache.org/jira/browse/HADOOP-19046
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Move up to the most recent versions of the v2 sdk, with a v1 update just to 
> keep some CVE checking happy.
> {code}
> 1.12.599
> 2.23.5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.

2024-01-23 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810084#comment-17810084
 ] 

Steve Loughran commented on HADOOP-19013:
-

(update, yes, I see that some internal QE tests are failing on this, so yes, it 
is observable)

> fs.getXattrs(path) for S3FS doesn't have 
> x-amz-server-side-encryption-aws-kms-key-id header.
> 
>
> Key: HADOOP-19013
> URL: https://issues.apache.org/jira/browse/HADOOP-19013
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>
> Once a path while uploading has been encrypted with SSE-KMS with a key id and 
> then later when we try to read the attributes of the same file, it doesn't 
> contain the key id information as an attribute. should we add it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Moved] (HADOOP-19050) Add S3 Access Grants Support in S3A

2024-01-23 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HDFS-17351 to HADOOP-19050:


Component/s: fs/s3
 (was: fs/s3)
Key: HADOOP-19050  (was: HDFS-17351)
Project: Hadoop Common  (was: Hadoop HDFS)

> Add S3 Access Grants Support in S3A
> ---
>
> Key: HADOOP-19050
> URL: https://issues.apache.org/jira/browse/HADOOP-19050
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Reporter: Jason Han
>Priority: Minor
>
> Add support for S3 Access Grants 
> (https://aws.amazon.com/s3/features/access-grants/) in S3A.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-18975) AWS SDK v2: extend support for FIPS endpoints

2024-01-23 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18975.
-
Resolution: Fixed

> AWS SDK v2:  extend support for FIPS endpoints
> --
>
> Key: HADOOP-18975
> URL: https://issues.apache.org/jira/browse/HADOOP-18975
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> v1 SDK supported FIPS just by changing the endpoint.
> Now we have a new builder setting to use.
> * add new  fs.s3a.endpoint.fips option
> * pass it down
> * test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18975) AWS SDK v2: extend support for FIPS endpoints

2024-01-22 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18975:

Fix Version/s: 3.5.0
   3.4.1

> AWS SDK v2:  extend support for FIPS endpoints
> --
>
> Key: HADOOP-18975
> URL: https://issues.apache.org/jira/browse/HADOOP-18975
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> v1 SDK supported FIPS just by changing the endpoint.
> Now we have a new builder setting to use.
> * add new  fs.s3a.endpoint.fips option
> * pass it down
> * test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19033) S3A: disable checksum validation

2024-01-22 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19033:

Fix Version/s: 3.4.1

> S3A: disable checksum validation
> 
>
> Key: HADOOP-19033
> URL: https://issues.apache.org/jira/browse/HADOOP-19033
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> AWS v2 sdk turns on client-side checksum validation; this kills performance
> Given we are using TLS to download from AWS s3, there's implicit channel 
> checksumming going on on, that's along with the IPv4 TCP checksumming.
> We don't need it, all it does is slow us down.
> proposed: disable in DefaultS3ClientFactory
> I don't want to add an option to enable it as it only complicates life (yet 
> another config option), but I am open to persuasion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool

2024-01-22 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19015.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting 
> for connection from pool
> --
>
> Key: HADOOP-19015
> URL: https://issues.apache.org/jira/browse/HADOOP-19015
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Getting errors in jobs which can be fixed by increasing this 
> 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on 
> s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0:
>  software.amazon.awssdk.core.exception.SdkClientException: Unable to execute 
> HTTP request: Timeout waiting for connection from pool   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19004) S3A: Support Authentication through HttpSigner API

2024-01-22 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19004:

Fix Version/s: 3.4.1

> S3A: Support Authentication through HttpSigner API 
> ---
>
> Key: HADOOP-19004
> URL: https://issues.apache.org/jira/browse/HADOOP-19004
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The latest AWS SDK changes how signing works, and for signing S3Express 
> signatures the new {{software.amazon.awssdk.http.auth}} auth mechanism is 
> needed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19027) S3A: S3AInputStream doesn't recover from HTTP/channel exceptions

2024-01-22 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19027:

Fix Version/s: 3.4.1

> S3A: S3AInputStream doesn't recover from HTTP/channel exceptions
> 
>
> Key: HADOOP-19027
> URL: https://issues.apache.org/jira/browse/HADOOP-19027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> S3AInputStream doesn't seem to recover from Http exceptions raised through 
> HttpClient or through OpenSSL.
> * review the recovery code to make sure it is retrying enough, it looks 
> suspiciously like it doesn't
> * detect the relevant openssl, shaded httpclient and unshaded httpclient 
> exceptions, map to a standard one and treat as comms error in our retry policy
> This is not the same as the load balancer/proxy returning 443/444 which we 
> map to AWSNoResponseException. We can't reuse that as it expects to be 
> created from an 
> {{software.amazon.awssdk.awscore.exception.AwsServiceException}} exception 
> with the relevant fields...changing it could potentially be incompatible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18975) AWS SDK v2: extend support for FIPS endpoints

2024-01-22 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-18975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809384#comment-17809384
 ] 

Steve Loughran commented on HADOOP-18975:
-

* I want to cut all of s3 select; PR for that is 
https://github.com/apache/hadoop/pull/6144  -ready for review
* i stuck the fips endpoint into landsat as it is hosted in a fips region and 
so guarantees coverage. you must have set a global endpoint, rather than one 
for your test bucket -correct?

anyway, yes, can cut

> AWS SDK v2:  extend support for FIPS endpoints
> --
>
> Key: HADOOP-18975
> URL: https://issues.apache.org/jira/browse/HADOOP-18975
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> v1 SDK supported FIPS just by changing the endpoint.
> Now we have a new builder setting to use.
> * add new  fs.s3a.endpoint.fips option
> * pass it down
> * test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2024-01-21 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18883.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Expect-100 JDK bug resolution: prevent multiple server calls
> 
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>  
> With the current implementation of HttpURLConnection if server rejects the 
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance 
> (ex getHeaderField(), or getHeaderFields()). They will internally call 
> getOuputStream() which invokes writeRequests(), which make the actual server 
> call. 
> In the AbfsHttpOperation, after sendRequest() we call processResponse() 
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due 
> to expect-100 error, we consume the exception and let the code go ahead. So, 
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which 
> will be triggered after getOutputStream is failed. These invocation will lead 
> to server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19047) Support InMemory Tracking Of S3A Magic Commits

2024-01-19 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808798#comment-17808798
 ] 

Steve Loughran commented on HADOOP-19047:
-

I will, it should make task commit way faster for jobs with few files per task, 
as the scan phase is surplus

> Support InMemory Tracking Of S3A Magic Commits
> --
>
> Key: HADOOP-19047
> URL: https://issues.apache.org/jira/browse/HADOOP-19047
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
>
> The following are the operations which happens within a Task when it uses S3A 
> Magic Committer. 
> *During closing of stream*
> 1. A 0-byte file with a same name of the original file is uploaded to S3 
> using PUT operation. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152]
>  for more information. This is done so that the downstream application like 
> Spark could get the size of the file which is being written.
> 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176]
>  for more information.
> *During TaskCommit*
> 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number 
> of metadata file in S3 if a single task writes to 'x' files) are read and 
> rewritten to S3 as a single metadata file. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201]
>  for more information
> Since these operations happens with the Task JVM, We could optimize as well 
> as save cost by storing these information in memory when Task memory usage is 
> not a constraint. Hence the proposal here is to introduce a new MagicCommit 
> Tracker called "InMemoryMagicCommitTracker" which will store the 
> 1. Metadata of MPU in memory till the Task is committed
> 2. Store the size of the file which can be used by the downstream application 
> to get the file size before it is committed/visible to the output path.
> This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call 
> given a Task writes only 1 file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19048) S3A: ITestCustomSigner failing against S3Express Buckets

2024-01-19 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808678#comment-17808678
 ] 

Steve Loughran commented on HADOOP-19048:
-


{code}

[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 16.504 
s <<< FAILURE! - in org.apache.hadoop.fs.s3a.auth.ITestCustomSigner
[ERROR] testCustomSignerAndInitializer[bulk 
delete](org.apache.hadoop.fs.s3a.auth.ITestCustomSigner)  Time elapsed: 10.298 
s  <<< ERROR!
org.apache.hadoop.fs.s3a.AWSBadRequestException: PUT 0-byte object  on 
fork-0006/test/testCustomSignerAndInitializer[bulk delete]/customsignerpath1: 
software.amazon.awssdk.services.s3.model.S3Exception: 
x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* 
or x-amz-trailer headers were found. (Service: S3, Status Code: 400, Request 
ID: 0033eada6b00018d219615ea0509dc88f208b53e, Extended Request ID: 
ALDHV):InvalidRequest: x-amz-sdk-checksum-algorithm specified, but no 
corresponding x-amz-checksum-* or x-amz-trailer headers were found. (Service: 
S3, Status Code: 400, Request ID: 0033eada6b00018d219615ea0509dc88f208b53e, 
Extended Request ID: ALDHV)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:259)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.createEmptyObject(S3AFileSystem.java:4776)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.createFakeDirectory(S3AFileSystem.java:4752)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.access$3000(S3AFileSystem.java:285)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem$MkdirOperationCallbacksImpl.createFakeDirectory(S3AFileSystem.java:3812)
at 
org.apache.hadoop.fs.s3a.impl.MkdirOperation.execute(MkdirOperation.java:159)
at 
org.apache.hadoop.fs.s3a.impl.MkdirOperation.execute(MkdirOperation.java:57)
at 
org.apache.hadoop.fs.s3a.impl.ExecutingStoreOperation.apply(ExecutingStoreOperation.java:76)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2719)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2738)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:3778)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2494)
at 
org.apache.hadoop.fs.s3a.auth.ITestCustomSigner.lambda$runStoreOperationsAndVerify$0(ITestCustomSigner.java:160)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
at 
org.apache.hadoop.fs.s3a.auth.ITestCustomSigner.runStoreOperationsAndVerify(ITestCustomSigner.java:155)
at 
org.apache.hadoop.fs.s3a.auth.ITestCustomSigner.testCustomSignerAndInitializer(ITestCustomSigner.java:135)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:750)
Caused by: software.amazon.aws

[jira] [Updated] (HADOOP-19048) S3A: ITestCustomSigner failing against S3Express Buckets

2024-01-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19048:

Summary: S3A: ITestCustomSigner failing against S3Express Buckets  (was: 
ItestCustomSigner failing against S3Express Buckets)

> S3A: ITestCustomSigner failing against S3Express Buckets
> 
>
> Key: HADOOP-19048
> URL: https://issues.apache.org/jira/browse/HADOOP-19048
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.5.0
>Reporter: Steve Loughran
>Priority: Major
>
> getting test failures against S3 Express buckets with {{ItestCustomSigner}}; 
> not seen with classic s3 stores.
> {code}
> [ERROR] 
> testCustomSignerAndInitializer[simple-delete](org.apache.hadoop.fs.s3a.auth.ITestCustomSigner)
>   Time elapsed: 6.12 s  <<< ERROR!
> org.apache.hadoop.fs.s3a.AWSBadRequestException: PUT 0-byte object  on 
> fork-0006/test/testCustomSignerAndInitializer[simple-delete]/customsignerpath1:
>  software.amazon.awssdk.services.s3.model.S3Exception: 
> x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* 
> or x-amz-trailer headers were found. (Service: S3, Status Code: 400, Request 
> ID: 0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: 
> kZJZG05LGCBu7lsNKNf):InvalidRequest: x-amz-sdk-checksum-algorithm specified, 
> but no corresponding x-amz-checksum-* or x-amz-trailer headers were found. 
> (Service: S3, Status Code: 400, Request ID: 
> 0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: 
> kZJZG05LGCBu7lsNKNf)
> at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:259)
> ...
> Caused by: software.amazon.awssdk.services.s3.model.S3Exception: 
> x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* 
> or x-amz-trailer headers were found. (Service: S3, Status Code: 400, Request 
> ID: 0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: 
> kZJZG05LGCBu7lsNKNf)
> at 
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
> at 
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19048) ItestCustomSigner failing against S3Express Buckets

2024-01-19 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19048:
---

 Summary: ItestCustomSigner failing against S3Express Buckets
 Key: HADOOP-19048
 URL: https://issues.apache.org/jira/browse/HADOOP-19048
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.5.0
Reporter: Steve Loughran


getting test failures against S3 Express buckets with {{ItestCustomSigner}}; 
not seen with classic s3 stores.


{code}
[ERROR] 
testCustomSignerAndInitializer[simple-delete](org.apache.hadoop.fs.s3a.auth.ITestCustomSigner)
  Time elapsed: 6.12 s  <<< ERROR!
org.apache.hadoop.fs.s3a.AWSBadRequestException: PUT 0-byte object  on 
fork-0006/test/testCustomSignerAndInitializer[simple-delete]/customsignerpath1: 
software.amazon.awssdk.services.s3.model.S3Exception: 
x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* 
or x-amz-trailer headers were found. (Service: S3, Status Code: 400, Request 
ID: 0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: 
kZJZG05LGCBu7lsNKNf):InvalidRequest: x-amz-sdk-checksum-algorithm specified, 
but no corresponding x-amz-checksum-* or x-amz-trailer headers were found. 
(Service: S3, Status Code: 400, Request ID: 
0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: 
kZJZG05LGCBu7lsNKNf)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:259)
...
Caused by: software.amazon.awssdk.services.s3.model.S3Exception: 
x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* 
or x-amz-trailer headers were found. (Service: S3, Status Code: 400, Request 
ID: 0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: 
kZJZG05LGCBu7lsNKNf)
at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)


{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19033) S3A: disable checksum validation

2024-01-19 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808676#comment-17808676
 ] 

Steve Loughran commented on HADOOP-19033:
-

if the next 3.4 RC is off branch-3.4 I can cherrypick and retest;  HADOOP-19027 
should go in *first* to avoid merge pain and correct other issues.
Ideally I'd cherrypick the current sequence of -aws changes including the WiP 
sdk update (HADOOP-19046), in the same order as they have gone in.

happy to do the work 

> S3A: disable checksum validation
> 
>
> Key: HADOOP-19033
> URL: https://issues.apache.org/jira/browse/HADOOP-19033
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> AWS v2 sdk turns on client-side checksum validation; this kills performance
> Given we are using TLS to download from AWS s3, there's implicit channel 
> checksumming going on on, that's along with the IPv4 TCP checksumming.
> We don't need it, all it does is slow us down.
> proposed: disable in DefaultS3ClientFactory
> I don't want to add an option to enable it as it only complicates life (yet 
> another config option), but I am open to persuasion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19033) S3A: disable checksum validation

2024-01-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19033.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> S3A: disable checksum validation
> 
>
> Key: HADOOP-19033
> URL: https://issues.apache.org/jira/browse/HADOOP-19033
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> AWS v2 sdk turns on client-side checksum validation; this kills performance
> Given we are using TLS to download from AWS s3, there's implicit channel 
> checksumming going on on, that's along with the IPv4 TCP checksumming.
> We don't need it, all it does is slow us down.
> proposed: disable in DefaultS3ClientFactory
> I don't want to add an option to enable it as it only complicates life (yet 
> another config option), but I am open to persuasion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-01-19 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808672#comment-17808672
 ] 

Steve Loughran commented on HADOOP-19045:
-

Actually this works for everything but createsession, my test was failing 
because i'd forgotten about that minimum interval value.

updating javadocs and setting a default timeout explicitly, with a new test for 
CreateSession

> S3A: pass request timeouts down to sdk clients
> --
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Major
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-01-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-19045:
---

Assignee: Steve Loughran

> S3A: pass request timeouts down to sdk clients
> --
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-17824) ITestCustomSigner fails with NPE against private endpoint

2024-01-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-17824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17824.
-
Resolution: Cannot Reproduce

stack trace no longer valid for v2 sdk; closing as cannot reproduce

> ITestCustomSigner fails with NPE against private endpoint
> -
>
> Key: HADOOP-17824
> URL: https://issues.apache.org/jira/browse/HADOOP-17824
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Priority: Minor
>
> ITestCustomSigner fails when the tester is pointed at a private endpoint



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19046) S3A: update sdk versions

2024-01-18 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19046:
---

 Summary: S3A: update sdk versions
 Key: HADOOP-19046
 URL: https://issues.apache.org/jira/browse/HADOOP-19046
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: build, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Move up to the most recent versions of the v2 sdk, with a v1 update just to 
keep some CVE checking happy.


{code}
1.12.599
2.23.5

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-01-18 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19045:
---

 Summary: S3A: pass request timeouts down to sdk clients
 Key: HADOOP-19045
 URL: https://issues.apache.org/jira/browse/HADOOP-19045
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


s3a client timeout settings are getting down to http client, but not sdk 
timeouts, so you can't have a longer timeout than the default. This surfaces in 
the inability to tune the timeouts for CreateSession calls even now the latest 
SDK does pick it up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-14837) Handle S3A "glacier" data

2024-01-18 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-14837:
---

Assignee: Bhavay Pahuja

> Handle S3A "glacier" data
> -
>
> Key: HADOOP-14837
> URL: https://issues.apache.org/jira/browse/HADOOP-14837
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Bhavay Pahuja
>Priority: Minor
>  Labels: pull-request-available
>
> SPARK-21797 covers how if you have AWS S3 set to copy some files to glacier, 
> they appear in the listing but GETs fail, and so does everything else
> We should think about how best to handle this.
> # report better
> # if listings can identify files which are glaciated then maybe we could have 
> an option to filter them out
> # test & see what happens



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19044) AWS SDK V2 - Update S3A region logic

2024-01-18 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808226#comment-17808226
 ] 

Steve Loughran commented on HADOOP-19044:
-

that spark stuff went in after we shipped a 3.3 release with broken region 
logic...surplus now, hopefully.

> AWS SDK V2 - Update S3A region logic 
> -
>
> Key: HADOOP-19044
> URL: https://issues.apache.org/jira/browse/HADOOP-19044
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Ahmar Suhail
>Priority: Major
>
> If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set 
> fs.s3a.endpoint to 
> s3.amazonaws.com here:
> [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540]
>  
>  
> HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is 
> set, or if a region can be parsed from fs.s3a.endpoint (which will happen in 
> this case, region will be US_EAST_1), cross region access is not enabled. 
> This will cause 400 errors if the bucket is not in US_EAST_1. 
>  
> Proposed: Updated the logic so that if the endpoint is the global 
> s3.amazonaws.com , cross region access is enabled.  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 5854 matches

Mail list logo