[jira] [Resolved] (HADOOP-18962) Upgrade kafka to 3.4.0
[ https://issues.apache.org/jira/browse/HADOOP-18962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18962. - Fix Version/s: 3.5.0 Resolution: Fixed > Upgrade kafka to 3.4.0 > -- > > Key: HADOOP-18962 > URL: https://issues.apache.org/jira/browse/HADOOP-18962 > Project: Hadoop Common > Issue Type: Bug >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Upgrade kafka-clients to 3.4.0 to fix > https://nvd.nist.gov/vuln/detail/CVE-2023-25194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19168) Upgrade Kafka Clients due to CVEs
[ https://issues.apache.org/jira/browse/HADOOP-19168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19168. - Resolution: Duplicate rohit, dupe of HADOOP-18962. let's focus on that > Upgrade Kafka Clients due to CVEs > - > > Key: HADOOP-19168 > URL: https://issues.apache.org/jira/browse/HADOOP-19168 > Project: Hadoop Common > Issue Type: Task >Reporter: Rohit Kumar >Priority: Major > Labels: pull-request-available > > Upgrade Kafka Clients due to CVEs > CVE-2023-25194:- Affected versions of this package are vulnerable to > Deserialization of Untrusted Data when there are gadgets in the > {{{}classpath{}}}. The server will connect to the attacker's LDAP server and > deserialize the LDAP response, which the attacker can use to execute java > deserialization gadget chains on the Kafka connect server. > CVSS Score:- 8.8(High) > [https://nvd.nist.gov/vuln/detail/CVE-2023-25194] > CVE-2021-38153 > CVE-2018-17196 > Insufficient Entropy > [https://security.snyk.io/package/maven/org.apache.kafka:kafka-clients] > Upgrade Kafka-Clients to 3.4.0 or higher. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19182) Upgrade kafka to 3.4.0
[ https://issues.apache.org/jira/browse/HADOOP-19182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19182. - Resolution: Duplicate > Upgrade kafka to 3.4.0 > -- > > Key: HADOOP-19182 > URL: https://issues.apache.org/jira/browse/HADOOP-19182 > Project: Hadoop Common > Issue Type: Bug > Components: build >Reporter: fuchaohong >Priority: Major > Labels: pull-request-available > > Upgrade kafka to 3.4.0 to resolve CVE-2023-25194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19185) Improve ABFS metric integration with iOStatistics
Steve Loughran created HADOOP-19185: --- Summary: Improve ABFS metric integration with iOStatistics Key: HADOOP-19185 URL: https://issues.apache.org/jira/browse/HADOOP-19185 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Steve Loughran Followup to HADOOP-18325 covering the outstanding comments of https://github.com/apache/hadoop/pull/6314/files -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18325) ABFS: Add correlated metric support for ABFS operations
[ https://issues.apache.org/jira/browse/HADOOP-18325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18325. - Fix Version/s: 3.5.0 Resolution: Fixed > ABFS: Add correlated metric support for ABFS operations > --- > > Key: HADOOP-18325 > URL: https://issues.apache.org/jira/browse/HADOOP-18325 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.3 >Reporter: Anmol Asrani >Assignee: Anmol Asrani >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Add metrics related to a particular job, specific to number of total > requests, retried requests, retry count and others -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19163) Upgrade protobuf version to 3.25.3
[ https://issues.apache.org/jira/browse/HADOOP-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19163. - Resolution: Fixed done. not sure what version to tag with. Proposed: we cut a new release of this > Upgrade protobuf version to 3.25.3 > -- > > Key: HADOOP-19163 > URL: https://issues.apache.org/jira/browse/HADOOP-19163 > Project: Hadoop Common > Issue Type: Bug > Components: hadoop-thirdparty >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19181) IAMCredentialsProvider throttle failures
Steve Loughran created HADOOP-19181: --- Summary: IAMCredentialsProvider throttle failures Key: HADOOP-19181 URL: https://issues.apache.org/jira/browse/HADOOP-19181 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Tests report throttling errors in IAM being remapped to noauth and failure Again, impala tests, but with multiple processes on same host. this means that HADOOP-18945 isn't sufficient as even if it ensures a singleton instance for a process * it doesn't if there are many test buckets (fixable) * it doesn't work across processes (not fixable) we may be able to * use a singleton across all filesystem instances * once we know how throttling is reported, handle it through retries + error/stats collection {code} 2024-02-17T18:02:10,175 WARN [TThreadPoolServer WorkerProcess-22] fs.FileSystem: Failed to initialize fileystem s3a://impala-test-uswest2-1/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels: java.nio.file.AccessDeniedException: impala-test-uswest2-1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId). 2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException impala-test-uswest2-1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId). java.nio.file.AccessDeniedException: impala-test-uswest2-1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId). at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543) ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524) ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445) ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2748) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:970) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.doBucketProbing(S3AFileSystem.java:859) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:715) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452) ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?] at
[jira] [Resolved] (HADOOP-19172) Upgrade aws-java-sdk to 1.12.720
[ https://issues.apache.org/jira/browse/HADOOP-19172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19172. - Fix Version/s: 3.3.9 3.5.0 3.4.1 Resolution: Fixed > Upgrade aws-java-sdk to 1.12.720 > > > Key: HADOOP-19172 > URL: https://issues.apache.org/jira/browse/HADOOP-19172 > Project: Hadoop Common > Issue Type: Improvement > Components: build, fs/s3 >Affects Versions: 3.4.0, 3.3.6 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.9, 3.5.0, 3.4.1 > > > Update to the latest AWS SDK, to stop anyone worrying about the ion library > CVE https://nvd.nist.gov/vuln/detail/CVE-2024-21634 > This isn't exposed in the s3a client, but may be used downstream. > on v2 sdk releases, the v1 sdk is only used during builds; 3.3.x it is shipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19073) WASB: Fix connection leak in FolderRenamePending
[ https://issues.apache.org/jira/browse/HADOOP-19073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19073. - Resolution: Fixed > WASB: Fix connection leak in FolderRenamePending > > > Key: HADOOP-19073 > URL: https://issues.apache.org/jira/browse/HADOOP-19073 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 3.3.6 >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Fix connection leak in FolderRenamePending in getting bytes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19176) S3A Xattr headers need hdfs-compatible prefix
Steve Loughran created HADOOP-19176: --- Summary: S3A Xattr headers need hdfs-compatible prefix Key: HADOOP-19176 URL: https://issues.apache.org/jira/browse/HADOOP-19176 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.3.6, 3.4.0 Reporter: Steve Loughran x3a xattr list needs a prefix compatible with hdfs or existing code which tries to copy attributes between stores can break we need a prefix of {user/trusted/security/system/raw}. now, problem: currently xattrs are used by the magic committer to propagate file size progress; renaming the prefix will break existing code. But as it's read only we could modify spark to look for both old and new values. {code} org.apache.hadoop.HadoopIllegalArgumentException: An XAttr name must be prefixed with user/trusted/security/system/raw, followed by a '.' at org.apache.hadoop.hdfs.XAttrHelper.buildXAttr(XAttrHelper.java:77) at org.apache.hadoop.hdfs.DFSClient.setXAttr(DFSClient.java:2835) at org.apache.hadoop.hdfs.DistributedFileSystem$59.doCall(DistributedFileSystem.java:3106) at org.apache.hadoop.hdfs.DistributedFileSystem$59.doCall(DistributedFileSystem.java:3102) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.setXAttr(DistributedFileSystem.java:3115) at org.apache.hadoop.fs.FileSystem.setXAttr(FileSystem.java:3097) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18958) Improve UserGroupInformation debug log
[ https://issues.apache.org/jira/browse/HADOOP-18958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18958. - Fix Version/s: 3.5.0 Resolution: Fixed > Improve UserGroupInformation debug log > --- > > Key: HADOOP-18958 > URL: https://issues.apache.org/jira/browse/HADOOP-18958 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 3.3.0, 3.3.5 >Reporter: wangzhihui >Assignee: wangzhihui >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0 > > Attachments: 20231029-122825-1.jpeg, 20231029-122825.jpeg, > 20231030-143525.jpeg, image-2023-10-29-09-47-56-489.png, > image-2023-10-30-14-35-11-161.png > > Original Estimate: 1h > Remaining Estimate: 1h > > Using “new Exception( )” to print the call stack of "doAs Method " in > the UserGroupInformation class. Using this way will print meaningless > Exception information and too many call stacks, This is not conducive to > troubleshooting > *example:* > !20231029-122825.jpeg|width=991,height=548! > > *improved result* : > > !image-2023-10-29-09-47-56-489.png|width=1099,height=156! > !20231030-143525.jpeg|width=572,height=674! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-18958) UserGroupInformation debug log improve
[ https://issues.apache.org/jira/browse/HADOOP-18958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reopened HADOOP-18958: - > UserGroupInformation debug log improve > -- > > Key: HADOOP-18958 > URL: https://issues.apache.org/jira/browse/HADOOP-18958 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 3.3.0, 3.3.5 >Reporter: wangzhihui >Priority: Minor > Labels: pull-request-available > Attachments: 20231029-122825-1.jpeg, 20231029-122825.jpeg, > 20231030-143525.jpeg, image-2023-10-29-09-47-56-489.png, > image-2023-10-30-14-35-11-161.png > > Original Estimate: 1h > Remaining Estimate: 1h > > Using “new Exception( )” to print the call stack of "doAs Method " in > the UserGroupInformation class. Using this way will print meaningless > Exception information and too many call stacks, This is not conducive to > troubleshooting > *example:* > !20231029-122825.jpeg|width=991,height=548! > > *improved result* : > > !image-2023-10-29-09-47-56-489.png|width=1099,height=156! > !20231030-143525.jpeg|width=572,height=674! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19175) update s3a committer docs
Steve Loughran created HADOOP-19175: --- Summary: update s3a committer docs Key: HADOOP-19175 URL: https://issues.apache.org/jira/browse/HADOOP-19175 Project: Hadoop Common Issue Type: Improvement Components: documentation, fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Update s3a committer docs * declare that magic committer is stable and make it the recommended one * show how to use new command "mapred successfile" to print the success file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19172) Upgrade aws-java-sdk to 1.12.720
Steve Loughran created HADOOP-19172: --- Summary: Upgrade aws-java-sdk to 1.12.720 Key: HADOOP-19172 URL: https://issues.apache.org/jira/browse/HADOOP-19172 Project: Hadoop Common Issue Type: Improvement Components: build, fs/s3 Affects Versions: 3.3.6, 3.4.0 Reporter: Steve Loughran Update to the latest AWS SDK, to stop anyone worrying about the ion library CVE https://nvd.nist.gov/vuln/detail/CVE-2024-21634 This isn't exposed in the s3a client, but may be used downstream. on v2 sdk releases, the v1 sdk is only used during builds; 3.3.x it is shipped -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19171) AWS v2: handle alternative forms of connection failure
Steve Loughran created HADOOP-19171: --- Summary: AWS v2: handle alternative forms of connection failure Key: HADOOP-19171 URL: https://issues.apache.org/jira/browse/HADOOP-19171 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.3.6, 3.4.0 Reporter: Steve Loughran We've had reports of network connection failures surfacing deeper in the stack where we don't convert to AWSApiCallTimeoutException so they aren't retried properly (retire connection and repeat) {code} Unable to execute HTTP request: Broken pipe (Write failed) {code} {code} Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19161) S3A: support a comma separated list of performance flags
Steve Loughran created HADOOP-19161: --- Summary: S3A: support a comma separated list of performance flags Key: HADOOP-19161 URL: https://issues.apache.org/jira/browse/HADOOP-19161 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 3.4.1 Reporter: Steve Loughran Assignee: Steve Loughran HADOOP-19072 shows we want to add more optimisations than that of HADOOP-18930. * Extending the new optimisations to the existing option is brittle * Adding explicit options for each feature gets complext fast. Proposed * A new class S3APerformanceFlags keeps all the flags * it build this from a string[] of values, which can be extracted from getConf(), * and it can also support a "*" option to mean "everything" * this class can also be handed off to hasPathCapability() and do the right thing. Proposed optimisations * create file (we will hook up HADOOP-18930) * mkdir (HADOOP-19072) * delete (probe for parent path) * rename (probe for source path) We could think of more, with different names, later. The goal is make it possible to strip out every HTTP request we do for safety/posix compliance, so applications have the option of turning off what they don't need. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19146) noaa-cors-pds bucket access with global endpoint fails
[ https://issues.apache.org/jira/browse/HADOOP-19146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19146. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > noaa-cors-pds bucket access with global endpoint fails > -- > > Key: HADOOP-19146 > URL: https://issues.apache.org/jira/browse/HADOOP-19146 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > All tests accessing noaa-cors-pds use us-east-1 region, as configured at > bucket level. If global endpoint is configured (e.g. us-west-2), they fail to > access to bucket. > > Sample error: > {code:java} > org.apache.hadoop.fs.s3a.AWSRedirectException: Received permanent redirect > response to region [us-east-1]. This likely indicates that the S3 region > configured in fs.s3a.endpoint.region does not match the AWS region containing > the bucket.: null (Service: S3, Status Code: 301, Request ID: > PMRWMQC9S91CNEJR, Extended Request ID: > 6Xrg9thLiZXffBM9rbSCRgBqwTxdLAzm6OzWk9qYJz1kGex3TVfdiMtqJ+G4vaYCyjkqL8cteKI/NuPBQu5A0Q==) > at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:253) > at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:155) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4041) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3947) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getFileStatus$26(S3AFileSystem.java:3924) > at > org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547) > at > org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528) > at > org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2716) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2735) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3922) > at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:115) > at org.apache.hadoop.fs.Globber.doGlob(Globber.java:349) > at org.apache.hadoop.fs.Globber.glob(Globber.java:202) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$globStatus$35(S3AFileSystem.java:4956) > at > org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547) > at > org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528) > at > org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2716) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2735) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.globStatus(S3AFileSystem.java:4949) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:313) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:281) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:445) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:311) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:328) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:201) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1677) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1674) > {code} > {code:java} > Caused by: software.amazon.awssdk.services.s3.model.S3Exception: null > (Service: S3, Status Code: 301, Request ID: PMRWMQC9S91CNEJR, Extended > Request ID: > 6Xrg9thLiZXffBM9rbSCRgBqwTxdLAzm6OzWk9qYJz1kGex3TVfdiMtqJ+G4vaYCyjkqL8cteKI/NuPBQu5A0Q==) > at > software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156) > at > software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108) > at > software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85) > at
[jira] [Resolved] (HADOOP-19159) Fix hadoop-aws document for fs.s3a.committer.abort.pending.uploads
[ https://issues.apache.org/jira/browse/HADOOP-19159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19159. - Fix Version/s: 3.3.9 3.5.0 3.4.1 Resolution: Fixed > Fix hadoop-aws document for fs.s3a.committer.abort.pending.uploads > -- > > Key: HADOOP-19159 > URL: https://issues.apache.org/jira/browse/HADOOP-19159 > Project: Hadoop Common > Issue Type: Improvement > Components: documentation >Reporter: Xi Chen >Assignee: Xi Chen >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.9, 3.5.0, 3.4.1 > > > The description about `fs.s3a.committer.abort.pending.uploads` in the > _Concurrent Jobs writing to the same destination_ is not all correct. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19158) Support delegating ByteBufferPositionedReadable to vector reads
Steve Loughran created HADOOP-19158: --- Summary: Support delegating ByteBufferPositionedReadable to vector reads Key: HADOOP-19158 URL: https://issues.apache.org/jira/browse/HADOOP-19158 Project: Hadoop Common Issue Type: Sub-task Components: fs, fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Make it easy for any stream with vector io to suppor Specifically, ByteBufferPositionedReadable.readFully() is exactly a single range read so is easy to read. the simpler read() call which can return less isn't part of the vector API. Proposed: invoke the readFully() but convert an EOFException to -1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19157) [ABFS] Filesystem contract tests to use methodPath for robust parallel test runs
Steve Loughran created HADOOP-19157: --- Summary: [ABFS] Filesystem contract tests to use methodPath for robust parallel test runs Key: HADOOP-19157 URL: https://issues.apache.org/jira/browse/HADOOP-19157 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure, test Affects Versions: 3.4.0 Reporter: Steve Loughran Assignee: Steve Loughran hadoop-azure supports parallel test runs, but unlike hadoop-aws, the azure ones are parallelised across methods in the same test suites. this can fail badly where contract tests have hard coded filenames and assume that they can use this across all test cases. Shows up when you are testing on a store with reduced IO capacity triggering retries and making some test cases slower Fix: hadoop-common contract tests to use methodPath() names -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
[ https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19102. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > [ABFS]: FooterReadBufferSize should not be greater than readBufferSize > -- > > Key: HADOOP-19102 > URL: https://issues.apache.org/jira/browse/HADOOP-19102 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > The method `optimisedRead` creates a buffer array of size `readBufferSize`. > If footerReadBufferSize is greater than readBufferSize, abfs will attempt to > read more data than the buffer array can hold, which causes an exception. > Change: To avoid this, we will keep footerBufferSize = > min(readBufferSizeConfig, footerBufferSizeConfig) > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19153) hadoop-common still exports logback as a transitive dependency
Steve Loughran created HADOOP-19153: --- Summary: hadoop-common still exports logback as a transitive dependency Key: HADOOP-19153 URL: https://issues.apache.org/jira/browse/HADOOP-19153 Project: Hadoop Common Issue Type: Bug Components: build, common Affects Versions: 3.4.0 Reporter: Steve Loughran Even though HADOOP-19084 set out to stop it, somehow ZK's declaration of a logback dependency is still contaminating the hadoop-common dependency graph, so causing problems downstream. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19079) HttpExceptionUtils to check that loaded class is really an exception before instantiation
[ https://issues.apache.org/jira/browse/HADOOP-19079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19079. - Fix Version/s: 3.3.9 3.5.0 3.4.1 Resolution: Fixed > HttpExceptionUtils to check that loaded class is really an exception before > instantiation > - > > Key: HADOOP-19079 > URL: https://issues.apache.org/jira/browse/HADOOP-19079 > Project: Hadoop Common > Issue Type: Task > Components: common, security >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Labels: pull-request-available > Fix For: 3.3.9, 3.5.0, 3.4.1 > > > It can be dangerous taking class names as inputs from HTTP messages even if > we control the source. Issue is in HttpExceptionUtils in hadoop-common > (validateResponse method). > I can provide a PR that will highlight the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19096) [ABFS] Enhancing Client-Side Throttling Metrics Updation Logic
[ https://issues.apache.org/jira/browse/HADOOP-19096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19096. - Fix Version/s: 3.5.0 Resolution: Fixed > [ABFS] Enhancing Client-Side Throttling Metrics Updation Logic > -- > > Key: HADOOP-19096 > URL: https://issues.apache.org/jira/browse/HADOOP-19096 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.1 >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > ABFS has a client-side throttling mechanism which works on the metrics > collected from past requests made. I requests are getting failed due to > throttling at server, we update our metrics and client side backoff is > calculated based on those metrics. > This PR enhances the logic to decide which requests should be considered to > compute client side backoff interval as follows: > For each request made by ABFS driver, we will determine if they should > contribute to Client-Side Throttling based on the status code and result: > # Status code in 2xx range: Successful Operations should contribute. > # Status code in 3xx range: Redirection Operations should not contribute. > # Status code in 4xx range: User Errors should not contribute. > # Status code is 503: Throttling Error should contribute only if they are > due to client limits breach as follows: > ## 503, Ingress Over Account Limit: Should Contribute > ## 503, Egress Over Account Limit: Should Contribute > ## 503, TPS Over Account Limit: Should Contribute > ## 503, Other Server Throttling: Should not Contribute. > # Status code in 5xx range other than 503: Should not Contribute. > # IOException and UnknownHostExceptions: Should not Contribute. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19098) Vector IO: consistent specified rejection of overlapping ranges
[ https://issues.apache.org/jira/browse/HADOOP-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19098. - Resolution: Fixed > Vector IO: consistent specified rejection of overlapping ranges > --- > > Key: HADOOP-19098 > URL: https://issues.apache.org/jira/browse/HADOOP-19098 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/s3 >Affects Versions: 3.3.6 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.3.9, 3.5.0, 3.4.1 > > > Related to PARQUET-2171 q: "how do you deal with overlapping ranges?" > I believe s3a rejects this, but the other impls may not. > Proposed > FS spec to say > * "overlap triggers IllegalArgumentException". > * special case: 0 byte ranges may be short circuited to return empty buffer > even without checking file length etc. > Contract tests to validate this > (+ common helper code to do this). > I'll copy the validation stuff into the parquet PR for consistency with older > releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19101) Vectored Read into off-heap buffer broken in fallback implementation
[ https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19101. - Fix Version/s: 3.3.9 3.4.1 Resolution: Fixed > Vectored Read into off-heap buffer broken in fallback implementation > > > Key: HADOOP-19101 > URL: https://issues.apache.org/jira/browse/HADOOP-19101 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure >Affects Versions: 3.4.0, 3.3.6 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Fix For: 3.3.9, 3.5.0, 3.4.1 > > > {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at > position zero even when the range is at a different offset. As a result: you > can get incorrect information. > Thanks for this is straightforward: we pass in a FileRange and use its offset > as the starting position. > However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely > read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we > have never seen this in production because the parquet and ORC libraries both > read into on-heap storage. > Those libraries needs to be audited to make sure that they never attempt to > read into off-heap DirectBuffers. This is a bit trickier than you would think > because an allocator is passed in. For PARQUET-2171 we will > * only invoke the API on streams which explicitly declare their support for > the API (so fallback in parquet itself) > * not invoke when direct buffer allocation is in use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19144) S3A prefetching to support Vector IO
Steve Loughran created HADOOP-19144: --- Summary: S3A prefetching to support Vector IO Key: HADOOP-19144 URL: https://issues.apache.org/jira/browse/HADOOP-19144 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Add explicit support for vector IO in s3a prefetching stream. * if a range is in 1+ cached block, it SHALL be read from cache and returned * if a range is not in cache : TBD * If a range is partially in cache: TBD these are the same decisions that abfs has to make: should the client fetch/cache block or just do one or more GET requests A big issue is: does caching of data fetched in a range request make any sense at all? Or more specifically: does fetching the blocks in which range requests are found make sense Simply going to the store is a lot simpler -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common
Steve Loughran created HADOOP-19140: --- Summary: [ABFS, S3A] Add IORateLimiter api to hadoop common Key: HADOOP-19140 URL: https://issues.apache.org/jira/browse/HADOOP-19140 Project: Hadoop Common Issue Type: Sub-task Components: fs, fs/azure, fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Create a rate limiter API in hadoop common which code (initially, manifest committer, bulk delete).. can request iO capacity for a specific operation. this can be exported by filesystems so support shared rate limiting across all threads pulled from HADOOP-19093 PR -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19115) upgrade to nimbus-jose-jwt 9.37.2 due to CVE
[ https://issues.apache.org/jira/browse/HADOOP-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19115. - Fix Version/s: 3.3.9 3.5.0 3.4.1 Assignee: PJ Fanning Resolution: Fixed > upgrade to nimbus-jose-jwt 9.37.2 due to CVE > > > Key: HADOOP-19115 > URL: https://issues.apache.org/jira/browse/HADOOP-19115 > Project: Hadoop Common > Issue Type: Bug > Components: build, CVE >Affects Versions: 3.4.0, 3.5.0 >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Labels: pull-request-available > Fix For: 3.3.9, 3.5.0, 3.4.1 > > > https://github.com/advisories/GHSA-gvpg-vgmx-xg6w -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19131) Assist reflection iO with WrappedOperations class
Steve Loughran created HADOOP-19131: --- Summary: Assist reflection iO with WrappedOperations class Key: HADOOP-19131 URL: https://issues.apache.org/jira/browse/HADOOP-19131 Project: Hadoop Common Issue Type: Sub-task Components: fs, fs/azure, fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran parquet, avro etc are still stuck building with older hadoop releases. This makes using new APIs hard (PARQUET-2117) and means that APIs which are 5 years old (!) such as HADOOP-15229 just aren't picked up. This lack of openFIle() adoption hurts working with files in cloud storage as * extra HEAD requests are made * read policies can't be explicitly set * split start/end can't be passed down Proposed # create class org.apache.hadoop.io.WrappedOperations # add methods to wrap the apis # test in contract tests via reflection loading -verifies we have done it properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19047) Support InMemory Tracking Of S3A Magic Commits
[ https://issues.apache.org/jira/browse/HADOOP-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19047. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > Support InMemory Tracking Of S3A Magic Commits > -- > > Key: HADOOP-19047 > URL: https://issues.apache.org/jira/browse/HADOOP-19047 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > The following are the operations which happens within a Task when it uses S3A > Magic Committer. > *During closing of stream* > 1. A 0-byte file with a same name of the original file is uploaded to S3 > using PUT operation. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152] > for more information. This is done so that the downstream application like > Spark could get the size of the file which is being written. > 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176] > for more information. > *During TaskCommit* > 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number > of metadata file in S3 if a single task writes to 'x' files) are read and > rewritten to S3 as a single metadata file. Refer > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201] > for more information > Since these operations happens with the Task JVM, We could optimize as well > as save cost by storing these information in memory when Task memory usage is > not a constraint. Hence the proposal here is to introduce a new MagicCommit > Tracker called "InMemoryMagicCommitTracker" which will store the > 1. Metadata of MPU in memory till the Task is committed > 2. Store the size of the file which can be used by the downstream application > to get the file size before it is committed/visible to the output path. > This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call > given a Task writes only 1 file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19116) update to zookeeper client 3.8.4 due to CVE-2024-23944
[ https://issues.apache.org/jira/browse/HADOOP-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19116. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > update to zookeeper client 3.8.4 due to CVE-2024-23944 > --- > > Key: HADOOP-19116 > URL: https://issues.apache.org/jira/browse/HADOOP-19116 > Project: Hadoop Common > Issue Type: Bug > Components: CVE >Affects Versions: 3.4.0, 3.3.6 >Reporter: PJ Fanning >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > https://github.com/advisories/GHSA-r978-9m6m-6gm6 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19089) [ABFS] Reverting Back Support of setXAttr() and getXAttr() on root path
[ https://issues.apache.org/jira/browse/HADOOP-19089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19089. - Fix Version/s: 3.5.0 Resolution: Fixed > [ABFS] Reverting Back Support of setXAttr() and getXAttr() on root path > --- > > Key: HADOOP-19089 > URL: https://issues.apache.org/jira/browse/HADOOP-19089 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0, 3.4.1 >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > A while back changes were made to support HDFS.setXAttr() and HDFS.getXAttr() > on root path for ABFS Driver. > For these, filesystem level APIs were introduced and used to set/get metadata > of container. > Refer to Jira: [HADOOP-18869] ABFS: Fixing Behavior of a File System APIs on > root path - ASF JIRA (apache.org) > Ideally, same set of APIs should be used, and root should be treated as a > path like any other path. > This change is to avoid calling container APIs for these HDFS calls. > As a result of this these APIs will fail on root path (as earlier) because > service does not support get/set of user properties on root path. > This change will also update the documentation to reflect that these > operations are not supported on root path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19122) testListPathWithValueGreaterThanServerMaximum assert failure on heavily loaded store
Steve Loughran created HADOOP-19122: --- Summary: testListPathWithValueGreaterThanServerMaximum assert failure on heavily loaded store Key: HADOOP-19122 URL: https://issues.apache.org/jira/browse/HADOOP-19122 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.0 Reporter: Steve Loughran on an azure store which may be experiencing throttling. the listPath call returns less than the 5K limit. the assertion needs to be changed for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19050) Add S3 Access Grants Support in S3A
[ https://issues.apache.org/jira/browse/HADOOP-19050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19050. - Fix Version/s: 3.5.0 Resolution: Fixed Fixed in trunk; backport to 3.4 should go in later. > Add S3 Access Grants Support in S3A > --- > > Key: HADOOP-19050 > URL: https://issues.apache.org/jira/browse/HADOOP-19050 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Jason Han >Assignee: Jason Han >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0 > > > Add support for S3 Access Grants > (https://aws.amazon.com/s3/features/access-grants/) in S3A. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19119) spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()
[ https://issues.apache.org/jira/browse/HADOOP-19119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19119. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > spotbugs complaining about possible NPE in > org.apache.hadoop.crypto.key.kms.ValueQueue.getSize() > > > Key: HADOOP-19119 > URL: https://issues.apache.org/jira/browse/HADOOP-19119 > Project: Hadoop Common > Issue Type: Sub-task > Components: crypto >Affects Versions: 3.5.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > PRs against hadoop-common are reporting spotbugs problems > {code} > Dodgy code Warnings > Code Warning > NPPossible null pointer dereference in > org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return > value of called method > Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details) > In class org.apache.hadoop.crypto.key.kms.ValueQueue > In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) > Local variable stored in JVM register ? > Dereferenced at ValueQueue.java:[line 332] > Known null at ValueQueue.java:[line 332] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19119) spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.alueQueue.getSize()
Steve Loughran created HADOOP-19119: --- Summary: spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.alueQueue.getSize() Key: HADOOP-19119 URL: https://issues.apache.org/jira/browse/HADOOP-19119 Project: Hadoop Common Issue Type: Sub-task Components: crypto Affects Versions: 3.5.0 Reporter: Steve Loughran Assignee: Steve Loughran PRs against hadoop-common are reporting spotbugs problems {code} Dodgy code Warnings CodeWarning NP Possible null pointer dereference in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value of called method Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details) In class org.apache.hadoop.crypto.key.kms.ValueQueue In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) Local variable stored in JVM register ? Dereferenced at ValueQueue.java:[line 332] Known null at ValueQueue.java:[line 332] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19066) AWS SDK V2 - Enabling FIPS should be allowed with central endpoint
[ https://issues.apache.org/jira/browse/HADOOP-19066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19066. - Fix Version/s: 3.4.1 Resolution: Fixed > AWS SDK V2 - Enabling FIPS should be allowed with central endpoint > -- > > Key: HADOOP-19066 > URL: https://issues.apache.org/jira/browse/HADOOP-19066 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.5.0, 3.4.1 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK > considers overriding endpoint and enabling fips as mutually exclusive, we > fail fast if fs.s3a.endpoint is set with fips support (details on > HADOOP-18975). > Now, we no longer override SDK endpoint for central endpoint since we enable > cross region access (details on HADOOP-19044) but we would still fail fast if > endpoint is central and fips is enabled. > Changes proposed: > * S3A to fail fast only if FIPS is enabled and non-central endpoint is > configured. > * Tests to ensure S3 bucket is accessible with default region us-east-2 with > cross region access (expected with central endpoint). > * Document FIPS support with central endpoint on connecting.html. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-19066) AWS SDK V2 - Enabling FIPS should be allowed with central endpoint
[ https://issues.apache.org/jira/browse/HADOOP-19066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reopened HADOOP-19066: - > AWS SDK V2 - Enabling FIPS should be allowed with central endpoint > -- > > Key: HADOOP-19066 > URL: https://issues.apache.org/jira/browse/HADOOP-19066 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.5.0, 3.4.1 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK > considers overriding endpoint and enabling fips as mutually exclusive, we > fail fast if fs.s3a.endpoint is set with fips support (details on > HADOOP-18975). > Now, we no longer override SDK endpoint for central endpoint since we enable > cross region access (details on HADOOP-19044) but we would still fail fast if > endpoint is central and fips is enabled. > Changes proposed: > * S3A to fail fast only if FIPS is enabled and non-central endpoint is > configured. > * Tests to ensure S3 bucket is accessible with default region us-east-2 with > cross region access (expected with central endpoint). > * Document FIPS support with central endpoint on connecting.html. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19108) S3 Express: document use
Steve Loughran created HADOOP-19108: --- Summary: S3 Express: document use Key: HADOOP-19108 URL: https://issues.apache.org/jira/browse/HADOOP-19108 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran The 3.4.0 release doesn't explicitly cover S3 Express. It's support is automatic * library handles it * hadoop shell commands know that there may be "missing" dirs in treewalks due to in-flight uploads * s3afs automatically switches to deleting pending uploads in delete(dir) call. we just need to provide a summary of features, how to probe etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19105) S3A: Recover from Vector IO read failures
Steve Loughran created HADOOP-19105: --- Summary: S3A: Recover from Vector IO read failures Key: HADOOP-19105 URL: https://issues.apache.org/jira/browse/HADOOP-19105 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.3.6, 3.4.0 Environment: s3a vector IO doesn't try to recover from read failures the way read() does. Need to * abort HTTP stream if considered needed * retry active read which failed * but not those which had succeeded Reporter: Steve Loughran -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19043) S3A: Regression: ITestS3AOpenCost fails on prefetch test runs
[ https://issues.apache.org/jira/browse/HADOOP-19043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19043. - Fix Version/s: 3.5.0 Resolution: Fixed > S3A: Regression: ITestS3AOpenCost fails on prefetch test runs > - > > Key: HADOOP-19043 > URL: https://issues.apache.org/jira/browse/HADOOP-19043 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0 > > > Getting test failures in the new ITestS3AOpenCost tests when run with > {{-Dprefetch}} > Thought I'd tested this, but clearly not > * class cast failures on asserts (fix: skip) > * bytes read different in one test: (fix: identify and address) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19104) S3A HeaderProcessing to process all metadata entries of HEAD response
Steve Loughran created HADOOP-19104: --- Summary: S3A HeaderProcessing to process all metadata entries of HEAD response Key: HADOOP-19104 URL: https://issues.apache.org/jira/browse/HADOOP-19104 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran S3A HeaderProcessing builds up an incomplete list of headers as its mapping of md to header. entries omits headers including x-amz-server-side-encryption-aws-kms-key-id proposed * review all headers which are stripped from "raw" responses and mapped into headers * make sure result of headers matches v1; looks like etags are different * make sure x-amz-server-side-encryption-aws-kms-key-id gets back * plus new checksum values {code} v1 sdk {code} # file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz header.Content-Length="524671" header.Content-Type="binary/octet-stream" header.ETag="3e39531220fbd3747d32cf93a79a7a0c" header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024" header.x-amz-server-side-encryption="AES256" {code} v2 SDK. note how etag is now double quoted. {code} # file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz header.Content-Length="524671" header.Content-Type="binary/octet-stream" header.ETag=""3e39531220fbd3747d32cf93a79a7a0c"" header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024" header.x-amz-server-side-encryption="AES256" {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19097) core-default fs.s3a.connection.establish.timeout value too low -warning always printed
[ https://issues.apache.org/jira/browse/HADOOP-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19097. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > core-default fs.s3a.connection.establish.timeout value too low -warning > always printed > -- > > Key: HADOOP-19097 > URL: https://issues.apache.org/jira/browse/HADOOP-19097 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > caused by HADOOP-18915. > in core-default we set the value of fs.s3a.connection.establish.timeout to 5s > {code} > > fs.s3a.connection.establish.timeout > 5s > > {code} > but there is a minimum of 15s, so this prints a warning > {code} > 2024-02-29 10:39:27,369 WARN impl.ConfigurationHelper: Option > fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 > ms instead > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19082) S3A: Update AWS SDK V2 to 2.24.6
[ https://issues.apache.org/jira/browse/HADOOP-19082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19082. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > S3A: Update AWS SDK V2 to 2.24.6 > > > Key: HADOOP-19082 > URL: https://issues.apache.org/jira/browse/HADOOP-19082 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Harshit Gupta >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > Update the AWS SDK to 2.24.6 from 2.23.5 for latest updates in packaging > w.r.t. imds module. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19101) Vectored Read into off-heap buffer broken
Steve Loughran created HADOOP-19101: --- Summary: Vectored Read into off-heap buffer broken Key: HADOOP-19101 URL: https://issues.apache.org/jira/browse/HADOOP-19101 Project: Hadoop Common Issue Type: Sub-task Components: fs, fs/azure Affects Versions: 3.3.6, 3.4.0 Reporter: Steve Loughran Assignee: Steve Loughran {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at position zero even when the range is at a different offset. As a result: you can get incorrect information. Thanks for this is straightforward: we pass in a FileRange and use its offset as the starting position. However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely read vectorIO into direct buffers through HDFS, ABFS or Azure. Note that we have never seen this in production because the parquet and ORC libraries both read into on-heap storage. Those libraries needs to be audited to make sure that they never attempt to read into off-heap DirectBuffers. This is a bit trickier than you would think because an allocator is passed in. For PARQUET-2171 we will * only invoke the API on streams which explicitly declare their support for the API (so fallback in parquet itself) * not invoke when direct buffer allocation is in use. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19098) Vector IO: consistent specified rejection of overlapping ranges
Steve Loughran created HADOOP-19098: --- Summary: Vector IO: consistent specified rejection of overlapping ranges Key: HADOOP-19098 URL: https://issues.apache.org/jira/browse/HADOOP-19098 Project: Hadoop Common Issue Type: Improvement Components: fs, fs/s3 Affects Versions: 3.3.6 Reporter: Steve Loughran Assignee: Steve Loughran Related to PARQUET-2171 q: "how do you deal with overlapping ranges?" I believe s3a rejects this, but the other impls may not. Proposed FS spec to say * "overlap triggers IllegalArgumentException". * special case: 0 byte ranges may be short circuited to return empty buffer even without checking file length etc. Contract tests to validate this (+ common helper code to do this). I'll copy the validation stuff into the parquet PR for consistency with older releases -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19097) core-default fs.s3a.connection.establish.timeout value too low -warning always printed
Steve Loughran created HADOOP-19097: --- Summary: core-default fs.s3a.connection.establish.timeout value too low -warning always printed Key: HADOOP-19097 URL: https://issues.apache.org/jira/browse/HADOOP-19097 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Assignee: Steve Loughran caused by HADOOP-18915. in core-default we set the value of fs.s3a.connection.establish.timeout to 5s {code} fs.s3a.connection.establish.timeout 5s {code} but there is a minimum of 15s, so this prints a warning {code} 2024-02-29 10:39:27,369 WARN impl.ConfigurationHelper: Option fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms instead {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19095) hadoop-aws: downgrade openssl export to test
Steve Loughran created HADOOP-19095: --- Summary: hadoop-aws: downgrade openssl export to test Key: HADOOP-19095 URL: https://issues.apache.org/jira/browse/HADOOP-19095 Project: Hadoop Common Issue Type: Improvement Components: build, fs/s3 Affects Versions: 3.3.4, 3.3.3, 3.3.5, 3.3.2, 3.3.1, 3.3.0, 3.4.0 Reporter: Steve Loughran As seen in dependency scans and mentioned in HADOOP-16346; wildfly/openssl jar is exported as runtime; it is only needed at test. proposed: downgrade -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19093) add load tests for abfs rename resilience
Steve Loughran created HADOOP-19093: --- Summary: add load tests for abfs rename resilience Key: HADOOP-19093 URL: https://issues.apache.org/jira/browse/HADOOP-19093 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure, test Affects Versions: 3.3.6 Reporter: Steve Loughran Assignee: Steve Loughran I need a load test to verify that the rename resilience of the manifest committer actually works as intended * test suite with name ILoadTest* prefix (as with s3) * parallel test running with many threads doing many renames * verify that rename recovery should be detected * and that all renames MUST NOT fail. maybe also: metrics for this in fs and doc update. Possibly; LogExactlyOnce to warn of load issues -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19092) ABFS phase 4: post Hadoop 3.4.0 features
Steve Loughran created HADOOP-19092: --- Summary: ABFS phase 4: post Hadoop 3.4.0 features Key: HADOOP-19092 URL: https://issues.apache.org/jira/browse/HADOOP-19092 Project: Hadoop Common Issue Type: Improvement Components: fs/azure Affects Versions: 3.4.0 Reporter: Steve Loughran Uber-JIRA for ABFS work so we can close HADOOP-18072 as done for 3.4.0 Assuming 3.4.1 is a rapid roll of packing, dependencies and critical fixes, this should target 3.4.2 and beyond -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19087) Release Hadoop 3.4.1
Steve Loughran created HADOOP-19087: --- Summary: Release Hadoop 3.4.1 Key: HADOOP-19087 URL: https://issues.apache.org/jira/browse/HADOOP-19087 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 3.4.0 Reporter: Steve Loughran Release a minor update to hadoop 3.4.0 with * packaging enhancements * updated dependencies (where viable) * fixes for critical issues found after 3.4.0 released * low-risk feature enhancements (those which don't impact schedule...) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19065) Update Protocol Buffers installation to 3.21.12
[ https://issues.apache.org/jira/browse/HADOOP-19065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19065. - Fix Version/s: 3.5.0 Resolution: Fixed > Update Protocol Buffers installation to 3.21.12 > > > Key: HADOOP-19065 > URL: https://issues.apache.org/jira/browse/HADOOP-19065 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Affects Versions: 3.4.0 >Reporter: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Update docs and docker script to cover downloading the 3.21.12 protobuf > compiler -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19086) move commons-logging to 1.2
Steve Loughran created HADOOP-19086: --- Summary: move commons-logging to 1.2 Key: HADOOP-19086 URL: https://issues.apache.org/jira/browse/HADOOP-19086 Project: Hadoop Common Issue Type: Improvement Components: build Affects Versions: 3.4.0 Reporter: Steve Loughran although hadoop doesn't use the APIs itself, it bundles commons-logging as things it depends on (http components) do. the version hadoop declares (1.1.3) is out of date compared to its dependencies * update pom and LICENSE-binary -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-18487) Make protobuf 2.5 an optional runtime dependency.
[ https://issues.apache.org/jira/browse/HADOOP-18487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reopened HADOOP-18487: - still there under yarn-api; will do followup > Make protobuf 2.5 an optional runtime dependency. > - > > Key: HADOOP-18487 > URL: https://issues.apache.org/jira/browse/HADOOP-18487 > Project: Hadoop Common > Issue Type: Improvement > Components: build, ipc >Affects Versions: 3.3.4 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > uses of protobuf 2.5 and RpcEnginej have been deprecated since 3.3.0 in > HADOOP-17046 > while still keeping those files around (for a long time...), how about we > make the protobuf 2.5.0 export off hadoop common and hadoop-hdfs *provided*, > rather than *compile* > that way, if apps want it for their own apis, they have to explicitly ask for > it, but at least our own scans don't break. > i have no idea what will happen to the rest of the stack at this point, it > will be "interesting" to see -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19084) hadoop-common exports logback as a transitive dependency
Steve Loughran created HADOOP-19084: --- Summary: hadoop-common exports logback as a transitive dependency Key: HADOOP-19084 URL: https://issues.apache.org/jira/browse/HADOOP-19084 Project: Hadoop Common Issue Type: Bug Components: build Affects Versions: 3.4.0, 3.5.0 Reporter: Steve Loughran this is probably caused by HADOOP-18613: ZK is pulling in some extra transitive stuff which surfaces in applications which import hadoop-common into their poms. It doesn't seem to show up in our distro, but downstream you get warnings about duplicate logging stuff {code} | +- org.apache.zookeeper:zookeeper:jar:3.8.3:compile | | +- org.apache.zookeeper:zookeeper-jute:jar:3.8.3:compile | | | \- (org.apache.yetus:audience-annotations:jar:0.12.0:compile - omitted for duplicate) | | +- org.apache.yetus:audience-annotations:jar:0.12.0:compile | | +- (io.netty:netty-handler:jar:4.1.94.Final:compile - omitted for conflict with 4.1.100.Final) | | +- (io.netty:netty-transport-native-epoll:jar:4.1.94.Final:compile - omitted for conflict with 4.1.100.Final) | | +- (org.slf4j:slf4j-api:jar:1.7.30:compile - omitted for duplicate) | | +- ch.qos.logback:logback-core:jar:1.2.10:compile | | +- ch.qos.logback:logback-classic:jar:1.2.10:compile | | | +- (ch.qos.logback:logback-core:jar:1.2.10:compile - omitted for duplicate) | | | \- (org.slf4j:slf4j-api:jar:1.7.32:compile - omitted for conflict with 1.7.30) | | \- (commons-io:commons-io:jar:2.11.0:compile - omitted for conflict with 2.14.0) {code} proposed: exclude the zk dependencies we either override outselves or don't need. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19083) hadoop binary tarball to exclude aws v2 sdk
Steve Loughran created HADOOP-19083: --- Summary: hadoop binary tarball to exclude aws v2 sdk Key: HADOOP-19083 URL: https://issues.apache.org/jira/browse/HADOOP-19083 Project: Hadoop Common Issue Type: Sub-task Components: build, fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Have the default hadoop binary .tar.gz exclude the aws v2 sdk by default. This SDK brings the total size of the distribution to about 1 GB. Proposed * add a profile to include the aws sdk in the dist module * disable it by default Instead we document which version is needed. The hadoop-aws and hadoop-cloud storage maven artifacts will declare their dependencies, so apps building with those get to do the download. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19080) S3A createFakeDirectory/put fails on object lock bucket if the path exists
Steve Loughran created HADOOP-19080: --- Summary: S3A createFakeDirectory/put fails on object lock bucket if the path exists Key: HADOOP-19080 URL: https://issues.apache.org/jira/browse/HADOOP-19080 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.3.6 Reporter: Steve Loughran s3 bucket with object lock enabled fails in createFakeDirectory (reported on S.O) error implies that we need to calculate and include the md5 checksum on the PUT, which gets complex once you include CSE into the mix: the checksum of the encrypted data is what'd be required. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19057) S3 public test bucket landsat-pds unreadable -needs replacement
[ https://issues.apache.org/jira/browse/HADOOP-19057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19057. - Fix Version/s: 3.3.9 Resolution: Fixed > S3 public test bucket landsat-pds unreadable -needs replacement > --- > > Key: HADOOP-19057 > URL: https://issues.apache.org/jira/browse/HADOOP-19057 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0, 3.2.4, 3.3.9, 3.3.6, 3.5.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Labels: pull-request-available > Fix For: 3.3.9, 3.5.0, 3.4.1 > > > The s3 test bucket used in hadoop-aws tests of S3 select and large file reads > is no longer publicly accessible > {code} > java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on > landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null > (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended > Request ID: > O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null > {code} > * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large > file for some reading tests > * changing the default value disables s3 select tests on older releases > * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it > will be skipped > Proposed > * we locate a new large file under the (requester pays) s3a://usgs-landsat/ > bucket . All releases with HADOOP-18168 can use this > * update 3.4.1 source to use this; document it > * do something similar for 3.3.9 + maybe even cut s3 select there too. > * document how to use it on older releases with requester-pays support > * document how to completely disable it on older releases. > h2. How to fix (most) landsat test failures on older releases > add this to your auth-keys.xml file. Expect some failures in a few tests > with-hardcoded references to the bucket (assumed role delegation tokens) > {code} > > fs.s3a.scale.test.csvfile > s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz > file used in scale tests > > > fs.s3a.bucket.noaa-cors-pds.endpoint.region > us-east-1 > > > fs.s3a.bucket.noaa-isd-pds.multipart.purge > false > Don't try to purge uploads in the read-only bucket, as > it will only create log noise. > > > fs.s3a.bucket.noaa-isd-pds.probe > 0 > Let's postpone existence checks to the first IO operation > > > > fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header > false > Do not add the referrer header > > > fs.s3a.bucket.noaa-isd-pds.prefetch.block.size > 128k > Use a small prefetch size so tests fetch multiple > blocks > > > fs.s3a.select.enabled > false > > {code} > Some delegation token tests will still fail; these have hard-coded references > to the old bucket. *Do not worry about these* > {code} > [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[0] » > AccessDenied s3a://la... > [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[1] » > AccessDenied s3a://la... > [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[2] » > AccessDenied s3a://la... > [ERROR] > ITestRoleDelegationInFilesystem>ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->ITestSessionDelegationInFilesystem.readLandsatMetadata:614 > » AccessDenied > [ERROR] > ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->readLandsatMetadata:614 > » AccessDenied > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18930) S3A: make fs.s3a.create.performance an option you can set for the entire bucket
[ https://issues.apache.org/jira/browse/HADOOP-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18930. - Fix Version/s: 3.4.0 (was: 3.3.7-aws) Resolution: Fixed > S3A: make fs.s3a.create.performance an option you can set for the entire > bucket > --- > > Key: HADOOP-18930 > URL: https://issues.apache.org/jira/browse/HADOOP-18930 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.9 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > make the fs.s3a.create.performance option something you can set everywhere, > rather than just in an openFile() option or under a magic path. > this improves performance on apps like iceberg where filenames are generated > with UUIDs in them, so we know there are no overwrites -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19059) update AWS SDK to support S3 Access Grants in S3A
[ https://issues.apache.org/jira/browse/HADOOP-19059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19059. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > update AWS SDK to support S3 Access Grants in S3A > - > > Key: HADOOP-19059 > URL: https://issues.apache.org/jira/browse/HADOOP-19059 > Project: Hadoop Common > Issue Type: Improvement > Components: build, fs/s3 >Affects Versions: 3.4.0 >Reporter: Jason Han >Assignee: Jason Han >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > Original Estimate: 168h > Remaining Estimate: 168h > > In order to support S3 Access > Grants(https://aws.amazon.com/s3/features/access-grants/) in S3A, we need to > update AWS SDK in hadooop package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19072) S3A: expand optimisations on stores with "fs.s3a.create.performance"
Steve Loughran created HADOOP-19072: --- Summary: S3A: expand optimisations on stores with "fs.s3a.create.performance" Key: HADOOP-19072 URL: https://issues.apache.org/jira/browse/HADOOP-19072 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran on an s3a store with fs.s3a.create.performance set, speed up other operations * mkdir to skip parent directory check: just do a HEAD to see if there's a file at the target location -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19045) S3A: pass request timeouts down to sdk clients
[ https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19045. - Resolution: Fixed > S3A: pass request timeouts down to sdk clients > -- > > Key: HADOOP-19045 > URL: https://issues.apache.org/jira/browse/HADOOP-19045 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > s3a client timeout settings are getting down to http client, but not sdk > timeouts, so you can't have a longer timeout than the default. This surfaces > in the inability to tune the timeouts for CreateSession calls even now the > latest SDK does pick it up -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-19045) S3A: pass request timeouts down to sdk clients
[ https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reopened HADOOP-19045: - this is broken because core-default.xml sets it to 0 > S3A: pass request timeouts down to sdk clients > -- > > Key: HADOOP-19045 > URL: https://issues.apache.org/jira/browse/HADOOP-19045 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > s3a client timeout settings are getting down to http client, but not sdk > timeouts, so you can't have a longer timeout than the default. This surfaces > in the inability to tune the timeouts for CreateSession calls even now the > latest SDK does pick it up -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18993) Allow to not isolate S3AFileSystem classloader when needed
[ https://issues.apache.org/jira/browse/HADOOP-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18993. - Fix Version/s: 3.5.0 Resolution: Fixed > Allow to not isolate S3AFileSystem classloader when needed > -- > > Key: HADOOP-18993 > URL: https://issues.apache.org/jira/browse/HADOOP-18993 > Project: Hadoop Common > Issue Type: Improvement > Components: hadoop-thirdparty >Affects Versions: 3.3.6 >Reporter: Antonio Murgia >Assignee: Antonio Murgia >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0 > > > In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be > the same as the one that loaded S3AFileSystem. This leads to the > impossibility in Spark applications to load third party credentials providers > as user jars. > I propose to add a configuration key > {{fs.s3a.extensions.isolated.classloader}} with a default value of {{true}} > that if set to {{false}} will not perform the classloader set. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19049) Class loader leak caused by StatisticsDataReferenceCleaner thread
[ https://issues.apache.org/jira/browse/HADOOP-19049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19049. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > Class loader leak caused by StatisticsDataReferenceCleaner thread > - > > Key: HADOOP-19049 > URL: https://issues.apache.org/jira/browse/HADOOP-19049 > Project: Hadoop Common > Issue Type: Bug > Components: common >Affects Versions: 3.3.6 >Reporter: Jia Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > The > "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" > daemon thread was created by FileSystem. > This is fine if the thread's context class loader is the system class loader, > but it's bad if the context class loader is a custom class loader. The > reference held by this daemon thread means that the class loader can never > become eligible for GC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19044) AWS SDK V2 - Update S3A region logic
[ https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19044. - Fix Version/s: 3.5.0 Resolution: Fixed > AWS SDK V2 - Update S3A region logic > - > > Key: HADOOP-19044 > URL: https://issues.apache.org/jira/browse/HADOOP-19044 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set > fs.s3a.endpoint to > s3.amazonaws.com here: > [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540] > > > HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is > set, or if a region can be parsed from fs.s3a.endpoint (which will happen in > this case, region will be US_EAST_1), cross region access is not enabled. > This will cause 400 errors if the bucket is not in US_EAST_1. > > Proposed: Updated the logic so that if the endpoint is the global > s3.amazonaws.com , cross region access is enabled. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18987) Corrections to Hadoop FileSystem API Definition
[ https://issues.apache.org/jira/browse/HADOOP-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18987. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > Corrections to Hadoop FileSystem API Definition > --- > > Key: HADOOP-18987 > URL: https://issues.apache.org/jira/browse/HADOOP-18987 > Project: Hadoop Common > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.6 >Reporter: Dieter De Paepe >Assignee: Dieter De Paepe >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > I noticed a lot of inconsistencies, typos and informal statements in the > "formal" FileSystem API definition > ([https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/index.html)] > Creating this ticket to link my PR against. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-17784) hadoop-aws landsat-pds test bucket will be deleted after Jul 1, 2021
[ https://issues.apache.org/jira/browse/HADOOP-17784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-17784. - Resolution: Duplicate HADOOP-17784 will address this now the bucket is completely gone > hadoop-aws landsat-pds test bucket will be deleted after Jul 1, 2021 > > > Key: HADOOP-17784 > URL: https://issues.apache.org/jira/browse/HADOOP-17784 > Project: Hadoop Common > Issue Type: Test > Components: fs/s3, test >Reporter: Leona Yoda >Priority: Major > Attachments: org.apache.hadoop.fs.s3a.select.ITestS3SelectMRJob.txt > > > I found an anouncement that landsat-pds buket will be deleted on July 1, 2021 > (https://registry.opendata.aws/landsat-8/) > and I think this bucket is used in th test of hadoop-aws module use > [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java#L93] > > At this time I can access the bucket but we might have to change the test > bucket in someday. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19057) S3 public test bucket landsat-pds unreadable -needs replacement
Steve Loughran created HADOOP-19057: --- Summary: S3 public test bucket landsat-pds unreadable -needs replacement Key: HADOOP-19057 URL: https://issues.apache.org/jira/browse/HADOOP-19057 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3, test Affects Versions: 3.3.6, 3.2.4, 3.4.0, 3.3.9, 3.5.0 Reporter: Steve Loughran The s3 test bucket used in hadoop-aws tests of S3 select and large file reads is no longer publicly accessible {code} java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended Request ID: O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null {code} * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large file for some reading tests * changing the default value disables s3 select tests on older releases * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it will be skipped Proposed * we locate a new large file under the (requester pays) s3a://usgs-landsat/ bucket . All releases with HADOOP-18168 can use this * update 3.4.1 source to use this; document it * do something similar for 3.3.9 + maybe even cut s3 select there too. * document how to use it on older releases with requester-pays support * document how to completely disable it on older releases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19022) S3A : ITestS3AConfiguration#testRequestTimeout failure
[ https://issues.apache.org/jira/browse/HADOOP-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19022. - Fix Version/s: 3.5.0 3.4.1 Assignee: Steve Loughran Resolution: Duplicate > S3A : ITestS3AConfiguration#testRequestTimeout failure > -- > > Key: HADOOP-19022 > URL: https://issues.apache.org/jira/browse/HADOOP-19022 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Steve Loughran >Priority: Minor > Fix For: 3.5.0, 3.4.1 > > > "fs.s3a.connection.request.timeout" should be specified in milliseconds as per > {code:java} > Duration apiCallTimeout = getDuration(conf, REQUEST_TIMEOUT, > DEFAULT_REQUEST_TIMEOUT_DURATION, TimeUnit.MILLISECONDS, Duration.ZERO); > {code} > The test fails consistently because it sets 120 ms timeout which is less than > 15s (min network operation duration), and hence gets reset to 15000 ms based > on the enforcement. > > {code:java} > [ERROR] testRequestTimeout(org.apache.hadoop.fs.s3a.ITestS3AConfiguration) > Time elapsed: 0.016 s <<< FAILURE! > java.lang.AssertionError: Configured fs.s3a.connection.request.timeout is > different than what AWS sdk configuration uses internally expected:<12> > but was:<15000> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.fs.s3a.ITestS3AConfiguration.testRequestTimeout(ITestS3AConfiguration.java:444) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19045) S3A: pass request timeouts down to sdk clients
[ https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19045. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > S3A: pass request timeouts down to sdk clients > -- > > Key: HADOOP-19045 > URL: https://issues.apache.org/jira/browse/HADOOP-19045 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > s3a client timeout settings are getting down to http client, but not sdk > timeouts, so you can't have a longer timeout than the default. This surfaces > in the inability to tune the timeouts for CreateSession calls even now the > latest SDK does pick it up -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18830) S3A: Cut S3 Select
[ https://issues.apache.org/jira/browse/HADOOP-18830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18830. - Fix Version/s: 3.4.1 Hadoop Flags: Incompatible change Release Note: S3 Select is no longer supported through the S3A connector Resolution: Fixed > S3A: Cut S3 Select > -- > > Key: HADOOP-18830 > URL: https://issues.apache.org/jira/browse/HADOOP-18830 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > getting s3 select to work with the v2 sdk is tricky, we need to add extra > libraries to the classpath beyond just bundle.jar. we can do this but > * AFAIK nobody has ever done CSV predicate pushdown, as it breaks split logic > completely > * CSV is a bad format > * one-line JSON more structured but also way less efficient > ORC/Parquet benefit from vectored IO and work spanning the cluster. > accordingly, I'm wondering what to do about s3 select > # cut? > # downgrade to optional and document the extra classes on the classpath > Option #2 is straightforward and effectively the default. we can also declare > the feature deprecated. > {code} > [ERROR] > testReadLandsatRecordsNoMatch(org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat) > Time elapsed: 147.958 s <<< ERROR! > java.io.IOException: java.lang.NoClassDefFoundError: > software/amazon/eventstream/MessageDecoder > at > org.apache.hadoop.fs.s3a.select.SelectObjectContentHelper.select(SelectObjectContentHelper.java:75) > at > org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$select$10(WriteOperationHelper.java:660) > at > org.apache.hadoop.fs.store.audit.AuditingFunctions.lambda$withinAuditSpan$0(AuditingFunctions.java:62) > at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19046) S3A: update sdk versions
[ https://issues.apache.org/jira/browse/HADOOP-19046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19046. - Fix Version/s: 3.5.0 Resolution: Fixed > S3A: update sdk versions > > > Key: HADOOP-19046 > URL: https://issues.apache.org/jira/browse/HADOOP-19046 > Project: Hadoop Common > Issue Type: Sub-task > Components: build, fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Move up to the most recent versions of the v2 sdk, with a v1 update just to > keep some CVE checking happy. > {code} > 1.12.599 > 2.23.5 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18975) AWS SDK v2: extend support for FIPS endpoints
[ https://issues.apache.org/jira/browse/HADOOP-18975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18975. - Resolution: Fixed > AWS SDK v2: extend support for FIPS endpoints > -- > > Key: HADOOP-18975 > URL: https://issues.apache.org/jira/browse/HADOOP-18975 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > v1 SDK supported FIPS just by changing the endpoint. > Now we have a new builder setting to use. > * add new fs.s3a.endpoint.fips option > * pass it down > * test -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19015. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting > for connection from pool > -- > > Key: HADOOP-19015 > URL: https://issues.apache.org/jira/browse/HADOOP-19015 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > Getting errors in jobs which can be fixed by increasing this > 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on > s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0: > software.amazon.awssdk.core.exception.SdkClientException: Unable to execute > HTTP request: Timeout waiting for connection from pool at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls
[ https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18883. - Fix Version/s: 3.5.0 Resolution: Fixed > Expect-100 JDK bug resolution: prevent multiple server calls > > > Key: HADOOP-18883 > URL: https://issues.apache.org/jira/browse/HADOOP-18883 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978]. > > With the current implementation of HttpURLConnection if server rejects the > “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be > thrown from 'expect100Continue()' method. > After the exception thrown, If we call any other method on the same instance > (ex getHeaderField(), or getHeaderFields()). They will internally call > getOuputStream() which invokes writeRequests(), which make the actual server > call. > In the AbfsHttpOperation, after sendRequest() we call processResponse() > method from AbfsRestOperation. Even if the conn.getOutputStream() fails due > to expect-100 error, we consume the exception and let the code go ahead. So, > we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which > will be triggered after getOutputStream is failed. These invocation will lead > to server calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19048) ItestCustomSigner failing against S3Express Buckets
Steve Loughran created HADOOP-19048: --- Summary: ItestCustomSigner failing against S3Express Buckets Key: HADOOP-19048 URL: https://issues.apache.org/jira/browse/HADOOP-19048 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3, test Affects Versions: 3.5.0 Reporter: Steve Loughran getting test failures against S3 Express buckets with {{ItestCustomSigner}}; not seen with classic s3 stores. {code} [ERROR] testCustomSignerAndInitializer[simple-delete](org.apache.hadoop.fs.s3a.auth.ITestCustomSigner) Time elapsed: 6.12 s <<< ERROR! org.apache.hadoop.fs.s3a.AWSBadRequestException: PUT 0-byte object on fork-0006/test/testCustomSignerAndInitializer[simple-delete]/customsignerpath1: software.amazon.awssdk.services.s3.model.S3Exception: x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* or x-amz-trailer headers were found. (Service: S3, Status Code: 400, Request ID: 0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: kZJZG05LGCBu7lsNKNf):InvalidRequest: x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* or x-amz-trailer headers were found. (Service: S3, Status Code: 400, Request ID: 0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: kZJZG05LGCBu7lsNKNf) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:259) ... Caused by: software.amazon.awssdk.services.s3.model.S3Exception: x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* or x-amz-trailer headers were found. (Service: S3, Status Code: 400, Request ID: 0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: kZJZG05LGCBu7lsNKNf) at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156) at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19033) S3A: disable checksum validation
[ https://issues.apache.org/jira/browse/HADOOP-19033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19033. - Fix Version/s: 3.5.0 Resolution: Fixed > S3A: disable checksum validation > > > Key: HADOOP-19033 > URL: https://issues.apache.org/jira/browse/HADOOP-19033 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > AWS v2 sdk turns on client-side checksum validation; this kills performance > Given we are using TLS to download from AWS s3, there's implicit channel > checksumming going on on, that's along with the IPv4 TCP checksumming. > We don't need it, all it does is slow us down. > proposed: disable in DefaultS3ClientFactory > I don't want to add an option to enable it as it only complicates life (yet > another config option), but I am open to persuasion -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-17824) ITestCustomSigner fails with NPE against private endpoint
[ https://issues.apache.org/jira/browse/HADOOP-17824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-17824. - Resolution: Cannot Reproduce stack trace no longer valid for v2 sdk; closing as cannot reproduce > ITestCustomSigner fails with NPE against private endpoint > - > > Key: HADOOP-17824 > URL: https://issues.apache.org/jira/browse/HADOOP-17824 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.3.1 >Reporter: Steve Loughran >Priority: Minor > > ITestCustomSigner fails when the tester is pointed at a private endpoint -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19046) S3A: update sdk versions
Steve Loughran created HADOOP-19046: --- Summary: S3A: update sdk versions Key: HADOOP-19046 URL: https://issues.apache.org/jira/browse/HADOOP-19046 Project: Hadoop Common Issue Type: Sub-task Components: build, fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Move up to the most recent versions of the v2 sdk, with a v1 update just to keep some CVE checking happy. {code} 1.12.599 2.23.5 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19045) S3A: pass request timeouts down to sdk clients
Steve Loughran created HADOOP-19045: --- Summary: S3A: pass request timeouts down to sdk clients Key: HADOOP-19045 URL: https://issues.apache.org/jira/browse/HADOOP-19045 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran s3a client timeout settings are getting down to http client, but not sdk timeouts, so you can't have a longer timeout than the default. This surfaces in the inability to tune the timeouts for CreateSession calls even now the latest SDK does pick it up -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19043) S3A: Regression: ITestS3AOpenCost fails on prefetch test runs
Steve Loughran created HADOOP-19043: --- Summary: S3A: Regression: ITestS3AOpenCost fails on prefetch test runs Key: HADOOP-19043 URL: https://issues.apache.org/jira/browse/HADOOP-19043 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3, test Affects Versions: 3.4.0 Reporter: Steve Loughran Assignee: Steve Loughran Getting test failures in the new ITestS3AOpenCost tests when run with {{-Dprefetch}} Thought I'd tested this, but clearly not * class cast failures on asserts (fix: skip) * bytes read different in one test: (fix: identify and address) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19042) S3A: detect and recover from SSL ConnectionReset exceptions
Steve Loughran created HADOOP-19042: --- Summary: S3A: detect and recover from SSL ConnectionReset exceptions Key: HADOOP-19042 URL: https://issues.apache.org/jira/browse/HADOOP-19042 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.3.6, 3.4.0 Reporter: Steve Loughran s3a input stream doesn't recover from SSL exceptions, specifically ConnectionReset This is a variant of HADOOP-19027, except it's surfaced on an older release... # need to make sure the specific exception is handled by aborting stream and retrying -so map to the new HttpChannelEOFException # all of thisd needs to be backported -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19027) S3A: S3AInputStream doesn't recover from HTTP/channel exceptions
[ https://issues.apache.org/jira/browse/HADOOP-19027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19027. - Fix Version/s: 3.5.0 Resolution: Fixed in 3.5 though I hope to backport to 3.4.1 > S3A: S3AInputStream doesn't recover from HTTP/channel exceptions > > > Key: HADOOP-19027 > URL: https://issues.apache.org/jira/browse/HADOOP-19027 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > S3AInputStream doesn't seem to recover from Http exceptions raised through > HttpClient or through OpenSSL. > * review the recovery code to make sure it is retrying enough, it looks > suspiciously like it doesn't > * detect the relevant openssl, shaded httpclient and unshaded httpclient > exceptions, map to a standard one and treat as comms error in our retry policy > This is not the same as the load balancer/proxy returning 443/444 which we > map to AWSNoResponseException. We can't reuse that as it expects to be > created from an > {{software.amazon.awssdk.awscore.exception.AwsServiceException}} exception > with the relevant fields...changing it could potentially be incompatible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19037) S3A: S3A: ITestS3AConfiguration failing with region problems
Steve Loughran created HADOOP-19037: --- Summary: S3A: S3A: ITestS3AConfiguration failing with region problems Key: HADOOP-19037 URL: https://issues.apache.org/jira/browse/HADOOP-19037 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3, test Affects Versions: 3.4.0 Reporter: Steve Loughran After commented out the default region in my ~/.aws/config [default} profile, test ITestS3AConfiguration. testS3SpecificSignerOverride() fails {code} [ERROR] testS3SpecificSignerOverride(org.apache.hadoop.fs.s3a.ITestS3AConfiguration) Time elapsed: 0.054 s <<< ERROR! software.amazon.awssdk.core.exception.SdkClientException: Unable to load region from any of the providers in the chain software.amazon.awssdk.regions.providers.DefaultAwsRegionProviderChain@12c626f8: [software.amazon.awssdk.regions.providers.SystemSettingsRegionProvider@ae63559: Unable to load region from system settings. Region must be specified either via environment variable (AWS_REGION) or system property (aws.region)., software.amazon.awssdk.regions.providers.AwsProfileRegionProvider@6e6cfd4c: No region provided in profile: default, software.amazon.awssdk.regions.providers.InstanceProfileRegionProvider@139147de: EC2 Metadata is disabled. Unable to retrieve region information from EC2 Metadata service.] {code} I'm worried the sdk update has rolled back to the 3.3.x region problems where well-configured developer setups / ec2 deployments hid problems. certainly we can see the code is checking these paths -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19004) S3A: Support Authentication through HttpSigner API
[ https://issues.apache.org/jira/browse/HADOOP-19004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19004. - Fix Version/s: 3.4.0 Resolution: Fixed > S3A: Support Authentication through HttpSigner API > --- > > Key: HADOOP-19004 > URL: https://issues.apache.org/jira/browse/HADOOP-19004 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > The latest AWS SDK changes how signing works, and for signing S3Express > signatures the new {{software.amazon.awssdk.http.auth}} auth mechanism is > needed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18981) Move oncrpc/portmap from hadoop-nfs to hadoop-common
[ https://issues.apache.org/jira/browse/HADOOP-18981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18981. - Fix Version/s: 3.4.0 Resolution: Fixed > Move oncrpc/portmap from hadoop-nfs to hadoop-common > > > Key: HADOOP-18981 > URL: https://issues.apache.org/jira/browse/HADOOP-18981 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > We want to use udpserver/client for other use cases, rather than only for > NFS. One such use case is to export NameNodeHAState for NameNodes via a UDP > server. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19033) S3A: disable checksum validation
Steve Loughran created HADOOP-19033: --- Summary: S3A: disable checksum validation Key: HADOOP-19033 URL: https://issues.apache.org/jira/browse/HADOOP-19033 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Steve Loughran Assignee: Steve Loughran AWS v2 sdk turns on client-side checksum validation; this kills performance Given we are using TLS to download from AWS s3, there's implicit channel checksumming going on on, that's along with the IPv4 TCP checksumming. We don't need it, all it does is slow us down. proposed: disable in DefaultS3ClientFactory I don't want to add an option to enable it as it only complicates life (yet another config option), but I am open to persuasion -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19032) MultiObjectDeleteException bulk delete of odd filenames
Steve Loughran created HADOOP-19032: --- Summary: MultiObjectDeleteException bulk delete of odd filenames Key: HADOOP-19032 URL: https://issues.apache.org/jira/browse/HADOOP-19032 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Possibly transient. note bucket is versioned. {code} org.apache.hadoop.fs.s3a.AWSS3IOException: Remove S3 Dir Markers on s3a://stevel-london/Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-aws/target/test-dir/7/testContextURI/createTest: org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException: [S3Error(Key=Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-aws/target/test-dir/7/testContextURI/createTest/()&^%$#@!~_+}{>
[jira] [Created] (HADOOP-19027) S3A: S3AInputStream doesn't recover from HTTP exceptions
Steve Loughran created HADOOP-19027: --- Summary: S3A: S3AInputStream doesn't recover from HTTP exceptions Key: HADOOP-19027 URL: https://issues.apache.org/jira/browse/HADOOP-19027 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Assignee: Steve Loughran S3AInputStream doesn't seem to recover from Http exceptions raised through HttpClient or through OpenSSL. * review the recovery code to make sure it is retrying enough, it looks suspiciously like it doesn't * detect the relevant openssl, shaded httpclient and unshaded httpclient exceptions, map to a standard one and treat as comms error in our retry policy This is not the same as the load balancer/proxy returning 443/444 which we map to AWSNoResponseException. We can't reuse that as it expects to be created from an {{software.amazon.awssdk.awscore.exception.AwsServiceException}} exception with the relevant fields...changing it could potentially be incompatible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19026) S3A: TestIAMInstanceCredentialsProvider.testIAMInstanceCredentialsInstantiate failure
Steve Loughran created HADOOP-19026: --- Summary: S3A: TestIAMInstanceCredentialsProvider.testIAMInstanceCredentialsInstantiate failure Key: HADOOP-19026 URL: https://issues.apache.org/jira/browse/HADOOP-19026 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3, test Affects Versions: 3.4.0 Reporter: Steve Loughran test failure in TestIAMInstanceCredentialsProvider; looks like the test is running in an EC2 VM whose IAM service isn't providing credentials -and the test isn't set up to ignore that. {code} Caused by: software.amazon.awssdk.core.exception.SdkClientException: The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/ at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111) at software.amazon.awssdk.regions.util.HttpResourcesUtils.readResource(HttpResourcesUtils.java:125) at software.amazon.awssdk.regions.util.HttpResourcesUtils.readResource(HttpResourcesUtils.java:91) at software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.lambda$getSecurityCredentials$3(InstanceProfileCredentialsProvider.java:256) at software.amazon.awssdk.utils.FunctionalUtils.lambda$safeSupplier$4(FunctionalUtils.java:108) at software.amazon.awssdk.utils.FunctionalUtils.invokeSafely(FunctionalUtils.java:136) at software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.getSecurityCredentials(InstanceProfileCredentialsProvider.java:256) at software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.createEndpointProvider(InstanceProfileCredentialsProvider.java:204) at software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.refreshCredentials(InstanceProfileCredentialsProvider.java:150) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-17912) ABFS: Support for Encryption Context
[ https://issues.apache.org/jira/browse/HADOOP-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-17912. - Fix Version/s: 3.4.0 Resolution: Fixed > ABFS: Support for Encryption Context > > > Key: HADOOP-17912 > URL: https://issues.apache.org/jira/browse/HADOOP-17912 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.1 >Reporter: Sumangala Patki >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Support for customer-provided encryption keys at the file level, superceding > the global (account-level) key use in HADOOP-17536. > ABFS driver will support an "EncryptionContext" plugin for retrieving > encryption information, the implementation for which should be provided by > the client. The keys/context retrieved will be sent via request headers to > the server, which will store the encryption context. Subsequent REST calls to > server that access data/user metadata of the file will require fetching the > encryption context through a GetFileProperties call and retrieving the key > from the custom provider, before sending the request. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18540) Upgrade Bouncy Castle to 1.70
[ https://issues.apache.org/jira/browse/HADOOP-18540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18540. - Fix Version/s: 3.4.0 Resolution: Fixed > Upgrade Bouncy Castle to 1.70 > - > > Key: HADOOP-18540 > URL: https://issues.apache.org/jira/browse/HADOOP-18540 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Affects Versions: 3.4.0 >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Upgrade Bouncycastle to 1.70 to resolve > > |[[sonatype-2021-4916] CWE-327: Use of a Broken or Risky Cryptographic > Algorithm|https://ossindex.sonatype.org/vulnerability/sonatype-2021-4916?component-type=maven=org.bouncycastle/bcprov-jdk15on]| > |[[sonatype-2019-0673] CWE-400: Uncontrolled Resource Consumption ('Resource > Exhaustion')|https://ossindex.sonatype.org/vulnerability/sonatype-2019-0673?component-type=maven=org.bouncycastle/bcprov-jdk15on]| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19008) S3A: Upgrade AWS SDK to 2.21.41
[ https://issues.apache.org/jira/browse/HADOOP-19008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19008. - Fix Version/s: 3.4.0 Resolution: Fixed > S3A: Upgrade AWS SDK to 2.21.41 > --- > > Key: HADOOP-19008 > URL: https://issues.apache.org/jira/browse/HADOOP-19008 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0, 3.3.7-aws >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > sdk 2.21.41 is out and logging now picks up the log4j.properties options. > move to this ASAP -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19008) S3A: Upgrade AWS SDK to 2.21.41
Steve Loughran created HADOOP-19008: --- Summary: S3A: Upgrade AWS SDK to 2.21.41 Key: HADOOP-19008 URL: https://issues.apache.org/jira/browse/HADOOP-19008 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0, 3.3.7-aws Reporter: Steve Loughran Assignee: Steve Loughran sdk 2.21.41 is out and logging now picks up the log4j.properties options. move to this ASAP -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18999) S3A: debug logging for http traffic to S3 stores
[ https://issues.apache.org/jira/browse/HADOOP-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18999. - Resolution: Not A Problem no longer needed as 2.21.41 logs properly! > S3A: debug logging for http traffic to S3 stores > > > Key: HADOOP-18999 > URL: https://issues.apache.org/jira/browse/HADOOP-18999 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > > AWS SDK bundle.jar logging doesn't set up right. > {code} > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > {code} > Cloudstore commands have a -debug option to force set this through log4j > APIs; this does work. > Proposed: > * add reflection-based ability to set/query log4j log levels (+tests, > obviously) > * add a new log `org.apache.hadoop.fs.s3a.logging.sdk` > * if set to DEBUG, DefaultS3ClientFactory will enable logging on the aws > internal/shaded classes > this allows log4j.properties to turn on logging; reflection ensures all is > well on other log back-ends and when unshaded aws sdk jars are in use -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19007) S3A: transfer manager not wired up to s3a executor pool
Steve Loughran created HADOOP-19007: --- Summary: S3A: transfer manager not wired up to s3a executor pool Key: HADOOP-19007 URL: https://issues.apache.org/jira/browse/HADOOP-19007 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran S3ClientFactory.createS3TransferManager() doesn't use the executor declared in S3ClientCreationParameters.transferManagerExecutor * method needs to take S3ClientCreationParameters * and set the transfer manager executor -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18925) S3A: add option "fs.s3a.copy.from.local.enabled" to enable/disable CopyFromLocalOperation
[ https://issues.apache.org/jira/browse/HADOOP-18925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18925. - Fix Version/s: 3.4.0 3.3.9 Resolution: Fixed > S3A: add option "fs.s3a.copy.from.local.enabled" to enable/disable > CopyFromLocalOperation > - > > Key: HADOOP-18925 > URL: https://issues.apache.org/jira/browse/HADOOP-18925 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > reported failure of CopyFromLocalOperation.getFinalPath() during job > submission with s3a declared as cluster fs. > add an emergency option to disable this optimised uploader and revert to the > superclass implementation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18997) S3A: Add option fs.s3a.s3express.create.session to enable/disable CreateSession
[ https://issues.apache.org/jira/browse/HADOOP-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18997. - Fix Version/s: 3.4.0 Resolution: Fixed > S3A: Add option fs.s3a.s3express.create.session to enable/disable > CreateSession > --- > > Key: HADOOP-18997 > URL: https://issues.apache.org/jira/browse/HADOOP-18997 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > add a way to disable the need to use the createsession call, so as to allow > for > * simplifying our role test runs > * benchmarking the performance hit > * troubleshooting IAM permissions > this can also be disabled from the sysprop "aws.disableS3ExpressAuth" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org