[jira] [Created] (HADOOP-19044) AWS SDK V2 - Update S3A region logic
Ahmar Suhail created HADOOP-19044: - Summary: AWS SDK V2 - Update S3A region logic Key: HADOOP-19044 URL: https://issues.apache.org/jira/browse/HADOOP-19044 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Ahmar Suhail If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set fs.s3a.endpoint to s3.amazonaws.com here: [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540] HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is set, or if a region can be parsed from fs.s3a.endpoint (which will happen in this case, region will be US_EAST_1), cross region access is not enabled. This will cause 400 errors if the bucket is not in US_EAST_1. Proposed: Updated the logic so that if the endpoint is the global s3.amazonaws.com , cross region access is enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19007) S3A: transfer manager not wired up to s3a executor pool
[ https://issues.apache.org/jira/browse/HADOOP-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794670#comment-17794670 ] Ahmar Suhail commented on HADOOP-19007: --- So I think what it means to pass in the executor has changed b/w V1 and V2. With V1, that executor pool would be used to make requests to S3, documented [here|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html#TransferManager-com.amazonaws.services.s3.AmazonS3-java.util.concurrent.ExecutorService-] With V2, if you pass it in, it's only used for certain background tasks before calling the S3AsyncClient such as visiting file tree in a S3TransferManager.uploadDirectory(UploadDirectoryRequest) operation, I don't think it's relevant for our usecase of copy. Documented [here|https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/transfer/s3/S3TransferManager.Builder.html#executor(java.util.concurrent.Executor)] It'll end up using the executor of the S3AsyncClient. Currently that client creates it's own executor pool, but we can also pass in our own if required. That behaviour is documented [here|https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/asynchronous.html] Do you think there is an advantage here of passing the in the boundedThreadPool to the S3AsyncClient? > S3A: transfer manager not wired up to s3a executor pool > --- > > Key: HADOOP-19007 > URL: https://issues.apache.org/jira/browse/HADOOP-19007 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Priority: Major > > S3ClientFactory.createS3TransferManager() doesn't use the executor declared > in S3ClientCreationParameters.transferManagerExecutor > * method needs to take S3ClientCreationParameters > * and set the transfer manager executor -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18888) S3A. createS3AsyncClient() always enables multipart
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-1: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > S3A. createS3AsyncClient() always enables multipart > --- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > DefaultS3ClientFactory.createS3AsyncClient() always creates clients with > multipart enabled; if it is disabled in s3a config it should be disabled here > and in the transfer manager -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18908) Improve s3a region handling, including determining from endpoint
[ https://issues.apache.org/jira/browse/HADOOP-18908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18908: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > Improve s3a region handling, including determining from endpoint > > > Key: HADOOP-18908 > URL: https://issues.apache.org/jira/browse/HADOOP-18908 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > S3A region logic improved for better inference and > to be compatible with previous releases > 1. If you are using an AWS S3 AccessPoint, its region is determined >from the ARN itself. > 2. If fs.s3a.endpoint.region is set and non-empty, it is used. > 3. If fs.s3a.endpoint is an s3.*.amazonaws.com url, >the region is determined by by parsing the URL >Note: vpce endpoints are not handled by this. > 4. If fs.s3a.endpoint.region==null, and none could be determined >from the endpoint, use us-east-2 as default. > 5. If fs.s3a.endpoint.region=="" then it is handed off to >The default AWS SDK resolution process. > Consult the AWS SDK documentation for the details on its resolution > process, knowing that it is complicated and may use environment variables, > entries in ~/.aws/config, IAM instance information within > EC2 deployments and possibly even JSON resources on the classpath. > Put differently: it is somewhat brittle across deployments. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18930) S3A: make fs.s3a.create.performance an option you can set for the entire bucket
[ https://issues.apache.org/jira/browse/HADOOP-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18930: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > S3A: make fs.s3a.create.performance an option you can set for the entire > bucket > --- > > Key: HADOOP-18930 > URL: https://issues.apache.org/jira/browse/HADOOP-18930 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.9 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > make the fs.s3a.create.performance option something you can set everywhere, > rather than just in an openFile() option or under a magic path. > this improves performance on apps like iceberg where filenames are generated > with UUIDs in them, so we know there are no overwrites -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18915) Tune/extend S3A http connection and thread pool settings
[ https://issues.apache.org/jira/browse/HADOOP-18915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18915: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > Tune/extend S3A http connection and thread pool settings > > > Key: HADOOP-18915 > URL: https://issues.apache.org/jira/browse/HADOOP-18915 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > Increases existing pool sizes, as with server scale and vector > IO, larger pools are needed > fs.s3a.connection.maximum 200 > fs.s3a.threads.max 96 > Adds new configuration options for v2 sdk internal timeouts, > both with default of 60s: > fs.s3a.connection.acquisition.timeout > fs.s3a.connection.idle.time > All the pool/timoeut options are covered in performance.md > Moves all timeout/duration options in the s3a FS to taking > temporal units (h, m, s, ms,...); retaining the previous default > unit (normally millisecond) > Adds a minimum duration for most of these, in order to recover from > deployments where a timeout has been set on the assumption the unit > was seconds, not millis. > Uses java.time.Duration throughout the codebase; > retaining the older numeric constants in > org.apache.hadoop.fs.s3a.Constants for backwards compatibility; > these are now deprecated. > Adds new class AWSApiCallTimeoutException to be raised on > sdk-related methods and also gateway timeouts. This is a subclass > of org.apache.hadoop.net.ConnectTimeoutException to support > existing retry logic. > + reverted default value of fs.s3a.create.performance to false; > inadvertently set to true during testing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18932) Upgrade AWS v2 SDK to 2.20.160 and v1 to 1.12.565
[ https://issues.apache.org/jira/browse/HADOOP-18932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18932: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > Upgrade AWS v2 SDK to 2.20.160 and v1 to 1.12.565 > - > > Key: HADOOP-18932 > URL: https://issues.apache.org/jira/browse/HADOOP-18932 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > Bump up the sdk versions for both...even if we don't ship v1 it helps us > qualify releases with newer versions, and means that an upgrade of that alone > to branch-3.3 will be in sync. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18995) S3A: Upgrade AWS SDK version to 2.21.33 for Amazon S3 Express One Zone support
[ https://issues.apache.org/jira/browse/HADOOP-18995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18995: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > S3A: Upgrade AWS SDK version to 2.21.33 for Amazon S3 Express One Zone support > -- > > Key: HADOOP-18995 > URL: https://issues.apache.org/jira/browse/HADOOP-18995 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > Upgrade SDK version to 2.21.33, which adds S3 Express One Zone support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18939) NPE in AWS v2 SDK RetryOnErrorCodeCondition.shouldRetry()
[ https://issues.apache.org/jira/browse/HADOOP-18939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18939: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > NPE in AWS v2 SDK RetryOnErrorCodeCondition.shouldRetry() > - > > Key: HADOOP-18939 > URL: https://issues.apache.org/jira/browse/HADOOP-18939 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > NPE in error handling code of RetryOnErrorCodeCondition.shouldRetry(); in > bundle-2.20.128.jar > This is AWS SDK code; fix needs to go there. > {code} > Caused by: java.lang.NullPointerException > at > software.amazon.awssdk.awscore.retry.conditions.RetryOnErrorCodeCondition.shouldRetry(RetryOnErrorCodeCondition.java:45) > ~[bundle-2.20.128.jar:?] > at > software.amazon.awssdk.core.retry.conditions.OrRetryCondition.lambda$shouldRetry$0(OrRetryCondition.java:46) > ~[bundle-2.20.128.jar:?] > at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90) > ~[?:1.8.0_382] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18945) S3A: IAMInstanceCredentialsProvider failing: Failed to load credentials from IMDS
[ https://issues.apache.org/jira/browse/HADOOP-18945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18945: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > S3A: IAMInstanceCredentialsProvider failing: Failed to load credentials from > IMDS > - > > Key: HADOOP-18945 > URL: https://issues.apache.org/jira/browse/HADOOP-18945 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 7.2.18.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > Failures in impala test VMs using iAM for auth > {code} > Failed to open file as a parquet file: java.net.SocketTimeoutException: > re-open > s3a://impala-test-uswest2-1/test-warehouse/test_pre_gregorian_date_parquet_2e80ae30.db/hive2_pre_gregorian.parquet > at 84 on > s3a://impala-test-uswest2-1/test-warehouse/test_pre_gregorian_date_parquet_2e80ae30.db/hive2_pre_gregorian.parquet: > org.apache.hadoop.fs.s3a.auth.NoAwsCredentialsException: +: Failed to load > credentials from IMDS > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18996) S3A to provide full support for S3 Express One Zone
[ https://issues.apache.org/jira/browse/HADOOP-18996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18996: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > S3A to provide full support for S3 Express One Zone > --- > > Key: HADOOP-18996 > URL: https://issues.apache.org/jira/browse/HADOOP-18996 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express > one zone support. > Complete support needs to be added to address tests that fail with s3 express > one zone, additional tests, documentation etc. > * hadoop-common path capability to indicate that treewalking may encounter > missing dirs > * use this in treewalking code in shell, mapreduce FileInputFormat etc to not > fail during treewalks > * extra path capability for s3express too. > * tests for this > * anything else -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18946) S3A: testMultiObjectExceptionFilledIn() assertion error
[ https://issues.apache.org/jira/browse/HADOOP-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18946: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > S3A: testMultiObjectExceptionFilledIn() assertion error > --- > > Key: HADOOP-18946 > URL: https://issues.apache.org/jira/browse/HADOOP-18946 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > Failure in the new test of HADOOP-18939. > I've been fiddling with the sdk upgrade, and only merged HADOOP-18932 after > submitting the new pr, so maybe, just maybe, the SDK changed some defaults. > anyway, > {code} > [ERROR] > testMultiObjectExceptionFilledIn(org.apache.hadoop.fs.s3a.impl.TestErrorTranslation) > Time elapsed: 0.026 s <<< FAILURE! > java.lang.AssertionError: retry policy of MultiObjectException > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > {code} > easily fixed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18948) S3A. Add option fs.s3a.directory.operations.purge.uploads to purge on rename/delete
[ https://issues.apache.org/jira/browse/HADOOP-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18948: -- Fix Version/s: 3.3.7-aws (was: 3.3.6-aws) > S3A. Add option fs.s3a.directory.operations.purge.uploads to purge on > rename/delete > --- > > Key: HADOOP-18948 > URL: https://issues.apache.org/jira/browse/HADOOP-18948 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.7-aws > > > On third-party stores without lifecycle rules its possible to accrue many GB > of pending multipart uploads, including from > * magic committer jobs where spark driver/MR AM failed before commit/abort > * distcp jobs which timeout and get aborted > * any client code writing datasets which are interrupted before close. > Although there's a purge pending uploads option, that's dangerous because if > any fs is instantiated with it, it can destroy in-flight work > otherwise, the "hadoop s3guard uploads" command does work but needs > scheduling/manual execution > proposed: add a new property {{fs.s3a.directory.operations.purge.uploads}} > which will automatically cancel all pending uploads under a path > * delete: everything under the dir > * rename: all under the source dir > This will be done in parallel to the normal operation, but no attempt to post > abortMultipartUploads in different threads. The assumption here is that this > is rare. And it'll be off by default as in AWS people should have rules for > these things. > + doc (third_party?) > + add new counter/metric for abort operations, count and duration > + test to include cost assertions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19003) S3A Assume role tests failing against S3Express stores
[ https://issues.apache.org/jira/browse/HADOOP-19003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793861#comment-17793861 ] Ahmar Suhail commented on HADOOP-19003: --- Checked, even if we disable createSession, any roles still need to use the s3Express name space and CreateSession action. I can work on this once I'm back from holiday, need to see if we should create new roles or skip failing tests, as you can only restrict on a bucket level and not by prefix. > S3A Assume role tests failing against S3Express stores > -- > > Key: HADOOP-19003 > URL: https://issues.apache.org/jira/browse/HADOOP-19003 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Priority: Minor > > The test suits which assume roles with restricted permissions down paths > still fail on S3Express, even after disabling createSession. > This is with a role which *should* work. > Either the role setup is wrong, or there's something special about role > configuration for S3Express buckets -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18996) S3A to provide full support for S3 Express One Zone
[ https://issues.apache.org/jira/browse/HADOOP-18996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18996: -- Fix Version/s: 3.3.6-aws > S3A to provide full support for S3 Express One Zone > --- > > Key: HADOOP-18996 > URL: https://issues.apache.org/jira/browse/HADOOP-18996 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express > one zone support. > Complete support needs to be added to address tests that fail with s3 express > one zone, additional tests, documentation etc. > * hadoop-common path capability to indicate that treewalking may encounter > missing dirs > * use this in treewalking code in shell, mapreduce FileInputFormat etc to not > fail during treewalks > * extra path capability for s3express too. > * tests for this > * anything else -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18995) S3A: Upgrade AWS SDK version to 2.21.33 for Amazon S3 Express One Zone support
[ https://issues.apache.org/jira/browse/HADOOP-18995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18995: -- Fix Version/s: 3.3.6-aws > S3A: Upgrade AWS SDK version to 2.21.33 for Amazon S3 Express One Zone support > -- > > Key: HADOOP-18995 > URL: https://issues.apache.org/jira/browse/HADOOP-18995 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > Upgrade SDK version to 2.21.33, which adds S3 Express One Zone support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18915) Tune/extend S3A http connection and thread pool settings
[ https://issues.apache.org/jira/browse/HADOOP-18915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18915: -- Fix Version/s: 3.3.6-aws > Tune/extend S3A http connection and thread pool settings > > > Key: HADOOP-18915 > URL: https://issues.apache.org/jira/browse/HADOOP-18915 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > Increases existing pool sizes, as with server scale and vector > IO, larger pools are needed > fs.s3a.connection.maximum 200 > fs.s3a.threads.max 96 > Adds new configuration options for v2 sdk internal timeouts, > both with default of 60s: > fs.s3a.connection.acquisition.timeout > fs.s3a.connection.idle.time > All the pool/timoeut options are covered in performance.md > Moves all timeout/duration options in the s3a FS to taking > temporal units (h, m, s, ms,...); retaining the previous default > unit (normally millisecond) > Adds a minimum duration for most of these, in order to recover from > deployments where a timeout has been set on the assumption the unit > was seconds, not millis. > Uses java.time.Duration throughout the codebase; > retaining the older numeric constants in > org.apache.hadoop.fs.s3a.Constants for backwards compatibility; > these are now deprecated. > Adds new class AWSApiCallTimeoutException to be raised on > sdk-related methods and also gateway timeouts. This is a subclass > of org.apache.hadoop.net.ConnectTimeoutException to support > existing retry logic. > + reverted default value of fs.s3a.create.performance to false; > inadvertently set to true during testing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18930) S3A: make fs.s3a.create.performance an option you can set for the entire bucket
[ https://issues.apache.org/jira/browse/HADOOP-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18930: -- Fix Version/s: 3.3.6-aws > S3A: make fs.s3a.create.performance an option you can set for the entire > bucket > --- > > Key: HADOOP-18930 > URL: https://issues.apache.org/jira/browse/HADOOP-18930 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.3.9 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > make the fs.s3a.create.performance option something you can set everywhere, > rather than just in an openFile() option or under a magic path. > this improves performance on apps like iceberg where filenames are generated > with UUIDs in them, so we know there are no overwrites -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18945) S3A: IAMInstanceCredentialsProvider failing: Failed to load credentials from IMDS
[ https://issues.apache.org/jira/browse/HADOOP-18945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18945: -- Fix Version/s: 3.3.6-aws > S3A: IAMInstanceCredentialsProvider failing: Failed to load credentials from > IMDS > - > > Key: HADOOP-18945 > URL: https://issues.apache.org/jira/browse/HADOOP-18945 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 7.2.18.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Blocker > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > Failures in impala test VMs using iAM for auth > {code} > Failed to open file as a parquet file: java.net.SocketTimeoutException: > re-open > s3a://impala-test-uswest2-1/test-warehouse/test_pre_gregorian_date_parquet_2e80ae30.db/hive2_pre_gregorian.parquet > at 84 on > s3a://impala-test-uswest2-1/test-warehouse/test_pre_gregorian_date_parquet_2e80ae30.db/hive2_pre_gregorian.parquet: > org.apache.hadoop.fs.s3a.auth.NoAwsCredentialsException: +: Failed to load > credentials from IMDS > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18946) S3A: testMultiObjectExceptionFilledIn() assertion error
[ https://issues.apache.org/jira/browse/HADOOP-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18946: -- Fix Version/s: 3.3.6-aws > S3A: testMultiObjectExceptionFilledIn() assertion error > --- > > Key: HADOOP-18946 > URL: https://issues.apache.org/jira/browse/HADOOP-18946 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > Failure in the new test of HADOOP-18939. > I've been fiddling with the sdk upgrade, and only merged HADOOP-18932 after > submitting the new pr, so maybe, just maybe, the SDK changed some defaults. > anyway, > {code} > [ERROR] > testMultiObjectExceptionFilledIn(org.apache.hadoop.fs.s3a.impl.TestErrorTranslation) > Time elapsed: 0.026 s <<< FAILURE! > java.lang.AssertionError: retry policy of MultiObjectException > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at > {code} > easily fixed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18948) S3A. Add option fs.s3a.directory.operations.purge.uploads to purge on rename/delete
[ https://issues.apache.org/jira/browse/HADOOP-18948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18948: -- Fix Version/s: 3.3.6-aws > S3A. Add option fs.s3a.directory.operations.purge.uploads to purge on > rename/delete > --- > > Key: HADOOP-18948 > URL: https://issues.apache.org/jira/browse/HADOOP-18948 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > On third-party stores without lifecycle rules its possible to accrue many GB > of pending multipart uploads, including from > * magic committer jobs where spark driver/MR AM failed before commit/abort > * distcp jobs which timeout and get aborted > * any client code writing datasets which are interrupted before close. > Although there's a purge pending uploads option, that's dangerous because if > any fs is instantiated with it, it can destroy in-flight work > otherwise, the "hadoop s3guard uploads" command does work but needs > scheduling/manual execution > proposed: add a new property {{fs.s3a.directory.operations.purge.uploads}} > which will automatically cancel all pending uploads under a path > * delete: everything under the dir > * rename: all under the source dir > This will be done in parallel to the normal operation, but no attempt to post > abortMultipartUploads in different threads. The assumption here is that this > is rare. And it'll be off by default as in AWS people should have rules for > these things. > + doc (third_party?) > + add new counter/metric for abort operations, count and duration > + test to include cost assertions -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18908) Improve s3a region handling, including determining from endpoint
[ https://issues.apache.org/jira/browse/HADOOP-18908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18908: -- Fix Version/s: 3.3.6-aws > Improve s3a region handling, including determining from endpoint > > > Key: HADOOP-18908 > URL: https://issues.apache.org/jira/browse/HADOOP-18908 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > S3A region logic improved for better inference and > to be compatible with previous releases > 1. If you are using an AWS S3 AccessPoint, its region is determined >from the ARN itself. > 2. If fs.s3a.endpoint.region is set and non-empty, it is used. > 3. If fs.s3a.endpoint is an s3.*.amazonaws.com url, >the region is determined by by parsing the URL >Note: vpce endpoints are not handled by this. > 4. If fs.s3a.endpoint.region==null, and none could be determined >from the endpoint, use us-east-2 as default. > 5. If fs.s3a.endpoint.region=="" then it is handed off to >The default AWS SDK resolution process. > Consult the AWS SDK documentation for the details on its resolution > process, knowing that it is complicated and may use environment variables, > entries in ~/.aws/config, IAM instance information within > EC2 deployments and possibly even JSON resources on the classpath. > Put differently: it is somewhat brittle across deployments. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18939) NPE in AWS v2 SDK RetryOnErrorCodeCondition.shouldRetry()
[ https://issues.apache.org/jira/browse/HADOOP-18939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18939: -- Fix Version/s: 3.3.6-aws > NPE in AWS v2 SDK RetryOnErrorCodeCondition.shouldRetry() > - > > Key: HADOOP-18939 > URL: https://issues.apache.org/jira/browse/HADOOP-18939 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > NPE in error handling code of RetryOnErrorCodeCondition.shouldRetry(); in > bundle-2.20.128.jar > This is AWS SDK code; fix needs to go there. > {code} > Caused by: java.lang.NullPointerException > at > software.amazon.awssdk.awscore.retry.conditions.RetryOnErrorCodeCondition.shouldRetry(RetryOnErrorCodeCondition.java:45) > ~[bundle-2.20.128.jar:?] > at > software.amazon.awssdk.core.retry.conditions.OrRetryCondition.lambda$shouldRetry$0(OrRetryCondition.java:46) > ~[bundle-2.20.128.jar:?] > at java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90) > ~[?:1.8.0_382] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18932) Upgrade AWS v2 SDK to 2.20.160 and v1 to 1.12.565
[ https://issues.apache.org/jira/browse/HADOOP-18932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18932: -- Fix Version/s: 3.3.6-aws > Upgrade AWS v2 SDK to 2.20.160 and v1 to 1.12.565 > - > > Key: HADOOP-18932 > URL: https://issues.apache.org/jira/browse/HADOOP-18932 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > Bump up the sdk versions for both...even if we don't ship v1 it helps us > qualify releases with newer versions, and means that an upgrade of that alone > to branch-3.3 will be in sync. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18888) S3A. createS3AsyncClient() always enables multipart
[ https://issues.apache.org/jira/browse/HADOOP-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-1: -- Fix Version/s: 3.3.6-aws > S3A. createS3AsyncClient() always enables multipart > --- > > Key: HADOOP-1 > URL: https://issues.apache.org/jira/browse/HADOOP-1 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6-aws > > > DefaultS3ClientFactory.createS3AsyncClient() always creates clients with > multipart enabled; if it is disabled in s3a config it should be disabled here > and in the transfer manager -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18995) Add support for Amazon S3 Express One Zone Storage - SDK version upgrade
[ https://issues.apache.org/jira/browse/HADOOP-18995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail reassigned HADOOP-18995: - Assignee: Ahmar Suhail > Add support for Amazon S3 Express One Zone Storage - SDK version upgrade > > > Key: HADOOP-18995 > URL: https://issues.apache.org/jira/browse/HADOOP-18995 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > > Upgrade SDK version to 2.21.33, which adds S3 Express One Zone support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18996) Add necessary software support for S3 Express One Zone
[ https://issues.apache.org/jira/browse/HADOOP-18996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18996: -- Component/s: fs/s3 > Add necessary software support for S3 Express One Zone > -- > > Key: HADOOP-18996 > URL: https://issues.apache.org/jira/browse/HADOOP-18996 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express > one zone support. > Complete support needs to be added to address tests that fail with s3 express > one zone, additional tests, documentation etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18996) Add necessary software support for S3 Express One Zone
[ https://issues.apache.org/jira/browse/HADOOP-18996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18996: -- Affects Version/s: 3.4.0 > Add necessary software support for S3 Express One Zone > -- > > Key: HADOOP-18996 > URL: https://issues.apache.org/jira/browse/HADOOP-18996 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express > one zone support. > Complete support needs to be added to address tests that fail with s3 express > one zone, additional tests, documentation etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18996) Add necessary software support for S3 Express One Zone
Ahmar Suhail created HADOOP-18996: - Summary: Add necessary software support for S3 Express One Zone Key: HADOOP-18996 URL: https://issues.apache.org/jira/browse/HADOOP-18996 Project: Hadoop Common Issue Type: Sub-task Reporter: Ahmar Suhail HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express one zone support. Complete support needs to be added to address tests that fail with s3 express one zone, additional tests, documentation etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18995) Add support for Amazon S3 Express One Zone Storage - SDK version upgrade
[ https://issues.apache.org/jira/browse/HADOOP-18995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18995: -- Summary: Add support for Amazon S3 Express One Zone Storage - SDK version upgrade (was: Add support for Amazon S3 Express One Zone Storage) > Add support for Amazon S3 Express One Zone Storage - SDK version upgrade > > > Key: HADOOP-18995 > URL: https://issues.apache.org/jira/browse/HADOOP-18995 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > Upgrade SDK version to 2.21.33, which adds S3 Express One Zone support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18995) Add support for Amazon S3 Express One Zone Storage
[ https://issues.apache.org/jira/browse/HADOOP-18995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18995: -- Description: Upgrade SDK version to 2.21.33, which adds S3 Express One Zone support. > Add support for Amazon S3 Express One Zone Storage > -- > > Key: HADOOP-18995 > URL: https://issues.apache.org/jira/browse/HADOOP-18995 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > Upgrade SDK version to 2.21.33, which adds S3 Express One Zone support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18995) Add support for Amazon S3 Express One Zone Storage
Ahmar Suhail created HADOOP-18995: - Summary: Add support for Amazon S3 Express One Zone Storage Key: HADOOP-18995 URL: https://issues.apache.org/jira/browse/HADOOP-18995 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Ahmar Suhail -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18938) Handle non standard endpoints
[ https://issues.apache.org/jira/browse/HADOOP-18938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18938: -- Parent Issue: HADOOP-18886 (was: HADOOP-18073) > Handle non standard endpoints > -- > > Key: HADOOP-18938 > URL: https://issues.apache.org/jira/browse/HADOOP-18938 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > For non standard endpoints such as VPCE the region parsing added in > HADOOP-18908 doesn't work. This is expected as that logic is only meant to be > used for standard endpoints. > If you are using a non-standard endpoint, check if a region is also provided, > else fail fast. > Also update documentation to explain to region and endpoint behaviour with > SDK V2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18938) Handle non standard endpoints
Ahmar Suhail created HADOOP-18938: - Summary: Handle non standard endpoints Key: HADOOP-18938 URL: https://issues.apache.org/jira/browse/HADOOP-18938 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Ahmar Suhail For non standard endpoints such as VPCE the region parsing added in HADOOP-18908 doesn't work. This is expected as that logic is only meant to be used for standard endpoints. If you are using a non-standard endpoint, check if a region is also provided, else fail fast. Also update documentation to explain to region and endpoint behaviour with SDK V2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18915) HTTP timeouts are not set correctly
[ https://issues.apache.org/jira/browse/HADOOP-18915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18915: -- Parent Issue: HADOOP-18886 (was: HADOOP-18073) > HTTP timeouts are not set correctly > --- > > Key: HADOOP-18915 > URL: https://issues.apache.org/jira/browse/HADOOP-18915 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > In the client config builders, when [setting > timeouts|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java#L120], > it uses Duration.ofSeconds(), configs all use milliseconds so this needs to > be updated to Duration.ofMillis(). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18915) HTTP timeouts are not set correctly
Ahmar Suhail created HADOOP-18915: - Summary: HTTP timeouts are not set correctly Key: HADOOP-18915 URL: https://issues.apache.org/jira/browse/HADOOP-18915 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Ahmar Suhail In the client config builders, when [setting timeouts|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSClientConfig.java#L120], it uses Duration.ofSeconds(), configs all use milliseconds so this needs to be updated to Duration.ofMillis(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18889) S3A: V2 SDK client does not work with third-party store
[ https://issues.apache.org/jira/browse/HADOOP-18889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767567#comment-17767567 ] Ahmar Suhail commented on HADOOP-18889: --- [https://github.com/apache/hadoop/pull/6106] (still WIP), removes the region check. > S3A: V2 SDK client does not work with third-party store > --- > > Key: HADOOP-18889 > URL: https://issues.apache.org/jira/browse/HADOOP-18889 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Critical > > testing against an external store without specifying region now blows up > because the region is queried off eu-west-1. > What are we do to here? require the region setting *which wasn't needed > before? what even region do we provide for third party stores? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18889) S3A: V2 SDK client does not work with third-party store
[ https://issues.apache.org/jira/browse/HADOOP-18889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail reassigned HADOOP-18889: - Assignee: Ahmar Suhail > S3A: V2 SDK client does not work with third-party store > --- > > Key: HADOOP-18889 > URL: https://issues.apache.org/jira/browse/HADOOP-18889 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Ahmar Suhail >Priority: Critical > > testing against an external store without specifying region now blows up > because the region is queried off eu-west-1. > What are we do to here? require the region setting *which wasn't needed > before? what even region do we provide for third party stores? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18889) S3A: V2 SDK client does not work with third-party store
[ https://issues.apache.org/jira/browse/HADOOP-18889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764664#comment-17764664 ] Ahmar Suhail commented on HADOOP-18889: --- we can probably get rid of the region query now..as SDK V2 has cross region support as of v2.20.99, it wasn't there when we started this work. > S3A: V2 SDK client does not work with third-party store > --- > > Key: HADOOP-18889 > URL: https://issues.apache.org/jira/browse/HADOOP-18889 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Priority: Critical > > testing against an external store without specifying region now blows up > because the region is queried off eu-west-1. > What are we do to here? require the region setting *which wasn't needed > before? what even region do we provide for third party stores? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18877) AWS SDK V2 - Move to S3 Java async client
[ https://issues.apache.org/jira/browse/HADOOP-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail reassigned HADOOP-18877: - Assignee: Ahmar Suhail > AWS SDK V2 - Move to S3 Java async client > - > > Key: HADOOP-18877 > URL: https://issues.apache.org/jira/browse/HADOOP-18877 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Assignee: Ahmar Suhail >Priority: Major > > With the upgrade, S3A now has two S3 clients the Java async client and the > Java sync client. > Java async is required for the transfer manager. > Java sync is used for everything else. > > * Move all operations to use the Java async client and remove the sync > client. > * Provide option to configure java async client with the CRT HTTP client. > * Create a new interface for S3Client operations, move them out of S3AFS. > interface will take request and span, and return response. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18877) AWS SDK V2 - Move to S3 Java async client
Ahmar Suhail created HADOOP-18877: - Summary: AWS SDK V2 - Move to S3 Java async client Key: HADOOP-18877 URL: https://issues.apache.org/jira/browse/HADOOP-18877 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Ahmar Suhail With the upgrade, S3A now has two S3 clients the Java async client and the Java sync client. Java async is required for the transfer manager. Java sync is used for everything else. * Move all operations to use the Java async client and remove the sync client. * Provide option to configure java async client with the CRT HTTP client. * Create a new interface for S3Client operations, move them out of S3AFS. interface will take request and span, and return response. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18853) AWS SDK V2 - Upgrade SDK to 2.20.28 and restores multipart copy
[ https://issues.apache.org/jira/browse/HADOOP-18853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18853: -- Summary: AWS SDK V2 - Upgrade SDK to 2.20.28 and restores multipart copy (was: AWS SDK V2 - Integrate new transfer manager) > AWS SDK V2 - Upgrade SDK to 2.20.28 and restores multipart copy > --- > > Key: HADOOP-18853 > URL: https://issues.apache.org/jira/browse/HADOOP-18853 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > Labels: pull-request-available > > With 2.20.121, the TM has MPU functionality. Upgrading to to this version > will also solve the issue with needing to include the CRT dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18853) AWS SDK V2 - Upgrade SDK to 2.20.28 and restores multipart copy
[ https://issues.apache.org/jira/browse/HADOOP-18853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18853: -- Description: With 2.20.121, the TM has MPU functionality. Upgrading to the latest version (2.20.28) will also solve the issue with needing to include the CRT dependency. (was: With 2.20.121, the TM has MPU functionality. Upgrading to to this version will also solve the issue with needing to include the CRT dependency. ) > AWS SDK V2 - Upgrade SDK to 2.20.28 and restores multipart copy > --- > > Key: HADOOP-18853 > URL: https://issues.apache.org/jira/browse/HADOOP-18853 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > Labels: pull-request-available > > With 2.20.121, the TM has MPU functionality. Upgrading to the latest version > (2.20.28) will also solve the issue with needing to include the CRT > dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18853) AWS SDK V2 - Integrate new transfer manager
Ahmar Suhail created HADOOP-18853: - Summary: AWS SDK V2 - Integrate new transfer manager Key: HADOOP-18853 URL: https://issues.apache.org/jira/browse/HADOOP-18853 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Ahmar Suhail With 2.20.121, the TM has MPU functionality. Upgrading to to this version will also solve the issue with needing to include the CRT dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18747) AWS SDK V2 - sigv2 support
[ https://issues.apache.org/jira/browse/HADOOP-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747555#comment-17747555 ] Ahmar Suhail commented on HADOOP-18747: --- * Yeah, the NPE isn't ideal, we could update with something like throw new IllegalArgumentException("unknown signer type, ensure it's included using fs.s3a.custom.signers" ); * The CRT doesn't currently support custom signers, which is why we don't use it yet. We may want to add it in the future (but without custom signer support, it will have to be an optional client and not the default) * The async client supports custom signers, and they are configured in the code, same as the sync client. in AwsClientConfig.createClientConfigBuilder > AWS SDK V2 - sigv2 support > -- > > Key: HADOOP-18747 > URL: https://issues.apache.org/jira/browse/HADOOP-18747 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > AWS SDK V2 does not support sigV2 signing. However, the S3 client supports > configurable signers so a custom sigV2 signer can be implemented and > configured. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18778) Test failures with CSE enabled
[ https://issues.apache.org/jira/browse/HADOOP-18778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail reassigned HADOOP-18778: - Assignee: Ahmar Suhail > Test failures with CSE enabled > -- > > Key: HADOOP-18778 > URL: https://issues.apache.org/jira/browse/HADOOP-18778 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Ahmar Suhail >Assignee: Ahmar Suhail >Priority: Major > Fix For: 3.3.9 > > > The following tests fail when run hadoop-aws suite is run with CSE enabled: > > {{ITestS3APrefetchingInputStream.testRandomReadLargeFile}} > {{ITestS3APrefetchingInputStream.testReadLargeFileFully}} > {{ITestS3APrefetchingInputStream.testReadLargeFileFullyLazySeek}} > {{ITestS3ARequesterPays.testRequesterPaysOptionSuccess}} > {{ITestAssumeRole.testReadOnlyOperations }} > {{ITestPartialRenamesDeletes.testRenameParentPathNotWriteable}} > {{ITestPartialRenamesDeletes.testRenameParentPathNotWriteable}} > {{ITestS3GuardTool.testLandsatBucketRequireUnencrypted}} > > Most of these are because they're using landsat data which is not encrypted, > so trying to read with a CSE will fail. These tests should be skipped if > using CSE. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18778) Test failures with CSE enabled
Ahmar Suhail created HADOOP-18778: - Summary: Test failures with CSE enabled Key: HADOOP-18778 URL: https://issues.apache.org/jira/browse/HADOOP-18778 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Ahmar Suhail Fix For: 3.3.9 The following tests fail when run hadoop-aws suite is run with CSE enabled: {{ITestS3APrefetchingInputStream.testRandomReadLargeFile}} {{ITestS3APrefetchingInputStream.testReadLargeFileFully}} {{ITestS3APrefetchingInputStream.testReadLargeFileFullyLazySeek}} {{ITestS3ARequesterPays.testRequesterPaysOptionSuccess}} {{ITestAssumeRole.testReadOnlyOperations }} {{ITestPartialRenamesDeletes.testRenameParentPathNotWriteable}} {{ITestPartialRenamesDeletes.testRenameParentPathNotWriteable}} {{ITestS3GuardTool.testLandsatBucketRequireUnencrypted}} Most of these are because they're using landsat data which is not encrypted, so trying to read with a CSE will fail. These tests should be skipped if using CSE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18673) AWS SDK V2 - Refactor getS3Region & other follow up items
[ https://issues.apache.org/jira/browse/HADOOP-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18673: -- Description: * Factor getS3Region into its own ExecutingStoreOperation; * Remove InconsistentS3ClientFactory. * Fix issue with getXAttr(/) * Look at adding flexible checksum support was: * Factor getS3Region into its own ExecutingStoreOperation; * Remove InconsistentS3ClientFactory. * Fix issue with getXAttr(/) > AWS SDK V2 - Refactor getS3Region & other follow up items > -- > > Key: HADOOP-18673 > URL: https://issues.apache.org/jira/browse/HADOOP-18673 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > * Factor getS3Region into its own ExecutingStoreOperation; > * Remove InconsistentS3ClientFactory. > * Fix issue with getXAttr(/) > * Look at adding flexible checksum support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18747) AWS SDK V2 - sigv2 support
[ https://issues.apache.org/jira/browse/HADOOP-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727402#comment-17727402 ] Ahmar Suhail commented on HADOOP-18747: --- Hey [~aajisaka] , yes that's true. Maybe it's not so important currently, but as newer features are added in the future, not having sigV2 will block users who require it from upgrading. So I think at some point it will need to be added in.. > AWS SDK V2 - sigv2 support > -- > > Key: HADOOP-18747 > URL: https://issues.apache.org/jira/browse/HADOOP-18747 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > AWS SDK V2 does not support sigV2 signing. However, the S3 client supports > configurable signers so a custom sigV2 signer can be implemented and > configured. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18747) AWS SDK V2 - sigv2 support
[ https://issues.apache.org/jira/browse/HADOOP-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725448#comment-17725448 ] Ahmar Suhail commented on HADOOP-18747: --- will have to find a bucket created before June 24, 2020 in a region that supports sigV2 (we do have one we could try) or test with a third party store > AWS SDK V2 - sigv2 support > -- > > Key: HADOOP-18747 > URL: https://issues.apache.org/jira/browse/HADOOP-18747 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > AWS SDK V2 does not support sigV2 signing. However, the S3 client supports > configurable signers so a custom sigV2 signer can be implemented and > configured. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18749) AWS SDK V2 - ITestS3AHugeFilesNoMultipart failure
Ahmar Suhail created HADOOP-18749: - Summary: AWS SDK V2 - ITestS3AHugeFilesNoMultipart failure Key: HADOOP-18749 URL: https://issues.apache.org/jira/browse/HADOOP-18749 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Ahmar Suhail ITestS3AHugeFilesNoMultipart fails with java.lang.AssertionError: Expected a org.apache.hadoop.fs.s3a.api.UnsupportedRequestException to be thrown, but got the result: : true Happens because the transfer manager currently does not do any MPU when used with the Java async client, so the UnsupportedRequestException never gets thrown. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18747) AWS SDK V2 - sigv2 support
Ahmar Suhail created HADOOP-18747: - Summary: AWS SDK V2 - sigv2 support Key: HADOOP-18747 URL: https://issues.apache.org/jira/browse/HADOOP-18747 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Ahmar Suhail AWS SDK V2 does not support sigV2 signing. However, the S3 client supports configurable signers so a custom sigV2 signer can be implemented and configured. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18744) ITestS3ABlockOutputArray failure with IO File name too long
Ahmar Suhail created HADOOP-18744: - Summary: ITestS3ABlockOutputArray failure with IO File name too long Key: HADOOP-18744 URL: https://issues.apache.org/jira/browse/HADOOP-18744 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Ahmar Suhail On an EC2 instance, the following tests are failing: {{{}ITestS3ABlockOutputArray.testDiskBlockCreate{}}}{{{}ITestS3ABlockOutputByteBuffer>ITestS3ABlockOutputArray.testDiskBlockCreate{}}}{{{}ITestS3ABlockOutputDisk>ITestS3ABlockOutputArray.testDiskBlockCreate{}}} with the error IO File name too long. The tests create a file with a 1024 char file name and rely on File.createTempFile() to truncate the file name to < OS limit. Stack trace: {{Java.io.IOException: File name too long}} {{ at java.io.UnixFileSystem.createFileExclusively(Native Method)}} {{ at java.io.File.createTempFile(File.java:2063)}} {{ at org.apache.hadoop.fs.s3a.S3AFileSystem.createTmpFileForWrite(S3AFileSystem.java:1377)}} {{ at org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory.create(S3ADataBlocks.java:829)}} {{ at org.apache.hadoop.fs.s3a.ITestS3ABlockOutputArray.testDiskBlockCreate(ITestS3ABlockOutputArray.java:114)}} {{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}} {{ at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)}} {{ at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-18073) Upgrade AWS SDK to v2
[ https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723516#comment-17723516 ] Ahmar Suhail edited comment on HADOOP-18073 at 5/17/23 3:40 PM: new rebased branch is [feature-HADOOP-18073-s3a-sdk-upgrade-rebase|https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade-rebase] ITestS3ABlockOutputArray.testDiskBlockCreate fails (also failing on trunk) on an EC2 instance, works ok on Mac. Looks like file names aren't being truncated on EC2 was (Author: JIRAUSER283484): new rebased branch is [https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade-rebase|https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade-rebase.] ITestS3ABlockOutputArray.testDiskBlockCreate fails (also failing on trunk) on an EC2 instance, works ok on Mac. Looks like file names aren't being truncated on EC2 > Upgrade AWS SDK to v2 > - > > Key: HADOOP-18073 > URL: https://issues.apache.org/jira/browse/HADOOP-18073 > Project: Hadoop Common > Issue Type: Task > Components: auth, fs/s3 >Affects Versions: 3.3.1 >Reporter: xiaowei sun >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Attachments: Upgrading S3A to SDKV2.pdf > > > This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java > V1 to AWS SDK for Java V2. > Original use case: > {quote}We would like to access s3 with AWS SSO, which is supported in > software.amazon.awssdk:sdk-core:2.*. > In particular, from > [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html], > when to set 'fs.s3a.aws.credentials.provider', it must be > "com.amazonaws.auth.AWSCredentialsProvider". We would like to support > "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which > supports AWS SSO, so users only need to authenticate once. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-18073) Upgrade AWS SDK to v2
[ https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723516#comment-17723516 ] Ahmar Suhail edited comment on HADOOP-18073 at 5/17/23 3:39 PM: new rebased branch is [https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade-rebase|https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade-rebase.] ITestS3ABlockOutputArray.testDiskBlockCreate fails (also failing on trunk) on an EC2 instance, works ok on Mac. Looks like file names aren't being truncated on EC2 was (Author: JIRAUSER283484): new rebased branch is [https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade-rebase |https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade-rebase.] ITestS3ABlockOutputArray.testDiskBlockCreate fails (also failing on trunk) on an EC2 instance, works ok on Mac. Looks like file names aren't being truncated on EC2 > Upgrade AWS SDK to v2 > - > > Key: HADOOP-18073 > URL: https://issues.apache.org/jira/browse/HADOOP-18073 > Project: Hadoop Common > Issue Type: Task > Components: auth, fs/s3 >Affects Versions: 3.3.1 >Reporter: xiaowei sun >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Attachments: Upgrading S3A to SDKV2.pdf > > > This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java > V1 to AWS SDK for Java V2. > Original use case: > {quote}We would like to access s3 with AWS SSO, which is supported in > software.amazon.awssdk:sdk-core:2.*. > In particular, from > [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html], > when to set 'fs.s3a.aws.credentials.provider', it must be > "com.amazonaws.auth.AWSCredentialsProvider". We would like to support > "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which > supports AWS SSO, so users only need to authenticate once. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18073) Upgrade AWS SDK to v2
[ https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723516#comment-17723516 ] Ahmar Suhail commented on HADOOP-18073: --- new rebased branch is [https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade-rebase |https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade-rebase.] ITestS3ABlockOutputArray.testDiskBlockCreate fails (also failing on trunk) on an EC2 instance, works ok on Mac. Looks like file names aren't being truncated on EC2 > Upgrade AWS SDK to v2 > - > > Key: HADOOP-18073 > URL: https://issues.apache.org/jira/browse/HADOOP-18073 > Project: Hadoop Common > Issue Type: Task > Components: auth, fs/s3 >Affects Versions: 3.3.1 >Reporter: xiaowei sun >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Attachments: Upgrading S3A to SDKV2.pdf > > > This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java > V1 to AWS SDK for Java V2. > Original use case: > {quote}We would like to access s3 with AWS SSO, which is supported in > software.amazon.awssdk:sdk-core:2.*. > In particular, from > [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html], > when to set 'fs.s3a.aws.credentials.provider', it must be > "com.amazonaws.auth.AWSCredentialsProvider". We would like to support > "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which > supports AWS SSO, so users only need to authenticate once. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18572) AWS SDK V2 - Fix failing tests
[ https://issues.apache.org/jira/browse/HADOOP-18572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail resolved HADOOP-18572. --- Resolution: Fixed resolved in https://issues.apache.org/jira/browse/HADOOP-18565 > AWS SDK V2 - Fix failing tests > -- > > Key: HADOOP-18572 > URL: https://issues.apache.org/jira/browse/HADOOP-18572 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > Labels: pull-request-available > > We have a few failing tests for various reasons. Some are dependent on the > TM, but others can be looked into and fixed. > |TestS3AExceptionTranslation|test301ContainsEndpoint|Missing endpoint in SDK > exception > ([aws/aws-sdk-java-v2#3048|https://github.com/aws/aws-sdk-java-v2/issues/3048])| > |TestStreamChangeTracker|testCopyETagRequired, > testCopyVersionIdRequired|Transfer Manager response does not yet have > {{CopyObjectResult}}| > |ITestS3AFileContextStatistics|testStatistics|ProgressListeners not attached > to non-TM uploads| > |ITestS3AEncryptionSSEC|multiple tests (14 out of 24)|Transfer Manager issue > with SSE-C| > |ITestXAttrCost|testXAttrRoot.|{{headObject()}} with empty key fails| > |ITestSessionDelegationInFileystem|testDelegatedFileSystem|Succeeds, but > {{headObject()}} with empty key commented out| > |ITestS3ACannedACLs|testCreatedObjectsHaveACLs|AWSCannedACL.LogDeliveryWrite > not supported in SDK v2| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18570) AWS SDK V2 - Update region logic
[ https://issues.apache.org/jira/browse/HADOOP-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail resolved HADOOP-18570. --- Resolution: Fixed Resolved in https://issues.apache.org/jira/browse/HADOOP-18565 > AWS SDK V2 - Update region logic > > > Key: HADOOP-18570 > URL: https://issues.apache.org/jira/browse/HADOOP-18570 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > SDK V2 will no longer resolve a buckets region if it is not set when > initialising the client. > > Current logic will always make a head bucket call on FS initialisation. We > should review this. Possible solution: > * Warn if region is not set. > * If no region, try and resolve. If resolution fails, throw an exception. > Cache the region to optimise for short lived FS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] (HADOOP-18570) AWS SDK V2 - Update region logic
[ https://issues.apache.org/jira/browse/HADOOP-18570 ] Ahmar Suhail deleted comment on HADOOP-18570: --- was (Author: JIRAUSER283484): marking as resolved as this was done as part of https://issues.apache.org/jira/browse/HADOOP-18565 > AWS SDK V2 - Update region logic > > > Key: HADOOP-18570 > URL: https://issues.apache.org/jira/browse/HADOOP-18570 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > SDK V2 will no longer resolve a buckets region if it is not set when > initialising the client. > > Current logic will always make a head bucket call on FS initialisation. We > should review this. Possible solution: > * Warn if region is not set. > * If no region, try and resolve. If resolution fails, throw an exception. > Cache the region to optimise for short lived FS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18570) AWS SDK V2 - Update region logic
[ https://issues.apache.org/jira/browse/HADOOP-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723510#comment-17723510 ] Ahmar Suhail commented on HADOOP-18570: --- marking as resolved as this was done as part of https://issues.apache.org/jira/browse/HADOOP-18565 > AWS SDK V2 - Update region logic > > > Key: HADOOP-18570 > URL: https://issues.apache.org/jira/browse/HADOOP-18570 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > SDK V2 will no longer resolve a buckets region if it is not set when > initialising the client. > > Current logic will always make a head bucket call on FS initialisation. We > should review this. Possible solution: > * Warn if region is not set. > * If no region, try and resolve. If resolution fails, throw an exception. > Cache the region to optimise for short lived FS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18708) AWS SDK V2 - Implement CSE
Ahmar Suhail created HADOOP-18708: - Summary: AWS SDK V2 - Implement CSE Key: HADOOP-18708 URL: https://issues.apache.org/jira/browse/HADOOP-18708 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Ahmar Suhail S3 Encryption client for SDK V2 is now available, so add client side encryption back in. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18683) Add new store vendor config option
[ https://issues.apache.org/jira/browse/HADOOP-18683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18683: -- Affects Version/s: 3.3.5 > Add new store vendor config option > -- > > Key: HADOOP-18683 > URL: https://issues.apache.org/jira/browse/HADOOP-18683 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.5 >Reporter: Ahmar Suhail >Priority: Minor > > Add in a new fs.s3a.store.vendor config, where users can specify the storage > vendor they are using (eg: aws, netapp, minio). > This will allow us to configure S3A correctly per vendor. For example, if the > vendor is not AWS, you probably want to use ListObjectsV1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18683) Add new store vendor config option
Ahmar Suhail created HADOOP-18683: - Summary: Add new store vendor config option Key: HADOOP-18683 URL: https://issues.apache.org/jira/browse/HADOOP-18683 Project: Hadoop Common Issue Type: Sub-task Reporter: Ahmar Suhail Add in a new fs.s3a.store.vendor config, where users can specify the storage vendor they are using (eg: aws, netapp, minio). This will allow us to configure S3A correctly per vendor. For example, if the vendor is not AWS, you probably want to use ListObjectsV1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18683) Add new store vendor config option
[ https://issues.apache.org/jira/browse/HADOOP-18683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18683: -- Component/s: fs/s3 > Add new store vendor config option > -- > > Key: HADOOP-18683 > URL: https://issues.apache.org/jira/browse/HADOOP-18683 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Ahmar Suhail >Priority: Minor > > Add in a new fs.s3a.store.vendor config, where users can specify the storage > vendor they are using (eg: aws, netapp, minio). > This will allow us to configure S3A correctly per vendor. For example, if the > vendor is not AWS, you probably want to use ListObjectsV1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18673) AWS SDK V2 - Refactor getS3Region & other follow up items
[ https://issues.apache.org/jira/browse/HADOOP-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18673: -- Component/s: fs/s3 > AWS SDK V2 - Refactor getS3Region & other follow up items > -- > > Key: HADOOP-18673 > URL: https://issues.apache.org/jira/browse/HADOOP-18673 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > * Factor getS3Region into its own ExecutingStoreOperation; > * Remove InconsistentS3ClientFactory. > * Fix issue with getXAttr(/) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18674) AWS SDK V2 - Add socket factory to Netty Client
[ https://issues.apache.org/jira/browse/HADOOP-18674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18674: -- Affects Version/s: 3.4.0 > AWS SDK V2 - Add socket factory to Netty Client > --- > > Key: HADOOP-18674 > URL: https://issues.apache.org/jira/browse/HADOOP-18674 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Minor > > The Java async client uses the netty http client. We should investigate how > to add a socket factory to this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18674) AWS SDK V2 - Add socket factory to Netty Client
[ https://issues.apache.org/jira/browse/HADOOP-18674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18674: -- Component/s: fs/s3 > AWS SDK V2 - Add socket factory to Netty Client > --- > > Key: HADOOP-18674 > URL: https://issues.apache.org/jira/browse/HADOOP-18674 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Minor > > The Java async client uses the netty http client. We should investigate how > to add a socket factory to this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18673) AWS SDK V2 - Refactor getS3Region & other follow up items
[ https://issues.apache.org/jira/browse/HADOOP-18673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18673: -- Affects Version/s: 3.4.0 > AWS SDK V2 - Refactor getS3Region & other follow up items > -- > > Key: HADOOP-18673 > URL: https://issues.apache.org/jira/browse/HADOOP-18673 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > * Factor getS3Region into its own ExecutingStoreOperation; > * Remove InconsistentS3ClientFactory. > * Fix issue with getXAttr(/) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18674) AWS SDK V2 - Add socket factory to Netty Client
Ahmar Suhail created HADOOP-18674: - Summary: AWS SDK V2 - Add socket factory to Netty Client Key: HADOOP-18674 URL: https://issues.apache.org/jira/browse/HADOOP-18674 Project: Hadoop Common Issue Type: Sub-task Reporter: Ahmar Suhail The Java async client uses the netty http client. We should investigate how to add a socket factory to this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18674) AWS SDK V2 - Add socket factory to Netty Client
[ https://issues.apache.org/jira/browse/HADOOP-18674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18674: -- Priority: Minor (was: Major) > AWS SDK V2 - Add socket factory to Netty Client > --- > > Key: HADOOP-18674 > URL: https://issues.apache.org/jira/browse/HADOOP-18674 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ahmar Suhail >Priority: Minor > > The Java async client uses the netty http client. We should investigate how > to add a socket factory to this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18673) AWS SDK V2 - Refactor getS3Region & other follow up items
Ahmar Suhail created HADOOP-18673: - Summary: AWS SDK V2 - Refactor getS3Region & other follow up items Key: HADOOP-18673 URL: https://issues.apache.org/jira/browse/HADOOP-18673 Project: Hadoop Common Issue Type: Sub-task Reporter: Ahmar Suhail * Factor getS3Region into its own ExecutingStoreOperation; * Remove InconsistentS3ClientFactory. * Fix issue with getXAttr(/) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18638) Encryption behaviour on copy
[ https://issues.apache.org/jira/browse/HADOOP-18638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18638: -- Summary: Encryption behaviour on copy (was: Encryption behaviour ) > Encryption behaviour on copy > > > Key: HADOOP-18638 > URL: https://issues.apache.org/jira/browse/HADOOP-18638 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ahmar Suhail >Priority: Major > > When doing a copy, S3A always uses encryption configuration of the > filesystem, rather than the source object. This behaviour may not have been > intended, as in `RequestFactoryImpl.copyEncryptionParameters()` it does copy > source object encryption properties > [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java#L336] > , but a missing return statement means it ends up using the FS settings > anyway. > > Proposed: > * If the copy is called by rename, always preserve source object encryption > properties. > * For all other copies, use current FS encryption settings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18638) Encryption behaviour
Ahmar Suhail created HADOOP-18638: - Summary: Encryption behaviour Key: HADOOP-18638 URL: https://issues.apache.org/jira/browse/HADOOP-18638 Project: Hadoop Common Issue Type: Sub-task Reporter: Ahmar Suhail When doing a copy, S3A always uses encryption configuration of the filesystem, rather than the source object. This behaviour may not have been intended, as in `RequestFactoryImpl.copyEncryptionParameters()` it does copy source object encryption properties [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java#L336] , but a missing return statement means it ends up using the FS settings anyway. Proposed: * If the copy is called by rename, always preserve source object encryption properties. * For all other copies, use current FS encryption settings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18565) AWS SDK V2 - Complete outstanding items
[ https://issues.apache.org/jira/browse/HADOOP-18565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18565: -- Description: The following work remains to complete the SDK upgrade work: * S3A allows users configure to custom signers, add in support for this. * Remove SDK V1 bundle dependency * Update `getRegion()` logic to use retries. * Add in progress listeners for `S3ABlockOutputStream` * Fix any failing tests. was:S3A allows users configure to custom signers, add in support for this. > AWS SDK V2 - Complete outstanding items > --- > > Key: HADOOP-18565 > URL: https://issues.apache.org/jira/browse/HADOOP-18565 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > The following work remains to complete the SDK upgrade work: > * S3A allows users configure to custom signers, add in support for this. > * Remove SDK V1 bundle dependency > * Update `getRegion()` logic to use retries. > * Add in progress listeners for `S3ABlockOutputStream` > * Fix any failing tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18565) AWS SDK V2 - Complete outstanding items
[ https://issues.apache.org/jira/browse/HADOOP-18565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18565: -- Summary: AWS SDK V2 - Complete outstanding items (was: AWS SDK V2 - Add in support of custom signers) > AWS SDK V2 - Complete outstanding items > --- > > Key: HADOOP-18565 > URL: https://issues.apache.org/jira/browse/HADOOP-18565 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > S3A allows users configure to custom signers, add in support for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18073) Upgrade AWS SDK to v2
[ https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678308#comment-17678308 ] Ahmar Suhail commented on HADOOP-18073: --- [~ste...@apache.org] / [~mthakur] , have just rebased my branch for the upgrade. Could you please push [this|https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade] branch up to the [Apache feature branch|https://github.com/apache/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade] ? I'd like to open a PR against this rebased branch which addresses some outstanding issues. > Upgrade AWS SDK to v2 > - > > Key: HADOOP-18073 > URL: https://issues.apache.org/jira/browse/HADOOP-18073 > Project: Hadoop Common > Issue Type: Task > Components: auth, fs/s3 >Affects Versions: 3.3.1 >Reporter: xiaowei sun >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Attachments: Upgrading S3A to SDKV2.pdf > > > This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java > V1 to AWS SDK for Java V2. > Original use case: > {quote}We would like to access s3 with AWS SSO, which is supported in > software.amazon.awssdk:sdk-core:2.*. > In particular, from > [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html], > when to set 'fs.s3a.aws.credentials.provider', it must be > "com.amazonaws.auth.AWSCredentialsProvider". We would like to support > "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which > supports AWS SSO, so users only need to authenticate once. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18579) Warn when no region is configured
[ https://issues.apache.org/jira/browse/HADOOP-18579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18579: -- Description: The AWS Java SDK V1 allows for cross region access. This means that even if you instantiate the S3 client with US_EAST_1 (or any region different to your actual bucket's region), the SDK will figure out the region. With the upgrade to SDK V2, this is no longer supported and the region should be set explicitly. Requests with the incorrect region will fail. To prepare for this change, S3A should warn when a region is not set via fs.s3a.endpoint.region. We should warn even if fs.s3a.endpoint is set and region can be parsed from this. This is because it is recommended to let the SDK V2 figure out the endpoint to use from the region, and so S3A should discourage from setting the endpoint unless absolutely required (eg for third party stores). Ideally rename fs.s3a.endpoint.region to fs.s3a.region, but not sure if this is ok to do. was: The AWS Java SDK V1 allows for cross region access. This means that even if you instantiate the S3 client with US_EAST_1 (or any region different to your actual bucket's region), the SDK will figure out the region. With the upgrade to SDK V2, this is no longer supported and the region should be set explicitly. Requests with the incorrect region will fail. To prepare for this change, S3A should warn when a region is not set via fs.s3a.endpoint.region. We should warn even if fs.s3a.endpoint is set and region can be parsed from this. This is because it is recommended to let the SDK V2 figure out the endpoint to use from the region, and so S3A should discourage from setting the endpoint unless absolutely required (eg for third party stores). > Warn when no region is configured > - > > Key: HADOOP-18579 > URL: https://issues.apache.org/jira/browse/HADOOP-18579 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.9 >Reporter: Ahmar Suhail >Priority: Minor > > The AWS Java SDK V1 allows for cross region access. This means that even if > you instantiate the S3 client with US_EAST_1 (or any region different to your > actual bucket's region), the SDK will figure out the region. > > With the upgrade to SDK V2, this is no longer supported and the region should > be set explicitly. Requests with the incorrect region will fail. To prepare > for this change, S3A should warn when a region is not set via > fs.s3a.endpoint.region. > > We should warn even if fs.s3a.endpoint is set and region can be parsed from > this. This is because it is recommended to let the SDK V2 figure out the > endpoint to use from the region, and so S3A should discourage from setting > the endpoint unless absolutely required (eg for third party stores). > > Ideally rename fs.s3a.endpoint.region to fs.s3a.region, but not sure if this > is ok to do. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18579) Warn when no region is configured
[ https://issues.apache.org/jira/browse/HADOOP-18579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18579: -- Affects Version/s: 3.3.9 > Warn when no region is configured > - > > Key: HADOOP-18579 > URL: https://issues.apache.org/jira/browse/HADOOP-18579 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.9 >Reporter: Ahmar Suhail >Priority: Minor > > The AWS Java SDK V1 allows for cross region access. This means that even if > you instantiate the S3 client with US_EAST_1 (or any region different to your > actual bucket's region), the SDK will figure out the region. > > With the upgrade to SDK V2, this is no longer supported and the region should > be set explicitly. Requests with the incorrect region will fail. To prepare > for this change, S3A should warn when a region is not set via > fs.s3a.endpoint.region. > > We should warn even if fs.s3a.endpoint is set and region can be parsed from > this. This is because it is recommended to let the SDK V2 figure out the > endpoint to use from the region, and so S3A should discourage from setting > the endpoint unless absolutely required (eg for third party stores). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18579) Warn when no region is configured
[ https://issues.apache.org/jira/browse/HADOOP-18579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18579: -- Component/s: fs/s3 > Warn when no region is configured > - > > Key: HADOOP-18579 > URL: https://issues.apache.org/jira/browse/HADOOP-18579 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Ahmar Suhail >Priority: Minor > > The AWS Java SDK V1 allows for cross region access. This means that even if > you instantiate the S3 client with US_EAST_1 (or any region different to your > actual bucket's region), the SDK will figure out the region. > > With the upgrade to SDK V2, this is no longer supported and the region should > be set explicitly. Requests with the incorrect region will fail. To prepare > for this change, S3A should warn when a region is not set via > fs.s3a.endpoint.region. > > We should warn even if fs.s3a.endpoint is set and region can be parsed from > this. This is because it is recommended to let the SDK V2 figure out the > endpoint to use from the region, and so S3A should discourage from setting > the endpoint unless absolutely required (eg for third party stores). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18579) Warn when no region is configured
[ https://issues.apache.org/jira/browse/HADOOP-18579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18579: -- Description: The AWS Java SDK V1 allows for cross region access. This means that even if you instantiate the S3 client with US_EAST_1 (or any region different to your actual bucket's region), the SDK will figure out the region. With the upgrade to SDK V2, this is no longer supported and the region should be set explicitly. Requests with the incorrect region will fail. To prepare for this change, S3A should warn when a region is not set via fs.s3a.endpoint.region. We should warn even if fs.s3a.endpoint is set and region can be parsed from this. This is because it is recommended to let the SDK V2 figure out the endpoint to use from the region, and so S3A should discourage from setting the endpoint unless absolutely required (eg for third party stores). was: The AWS Java SDK V1 allows for cross region access. This means that even if you have instantiate the S3 client with US_EAST_1 (or any region different to your actual bucket's region), the SDK will figure out the region. With the upgrade to SDK V2, this is no longer supported and the region should be set explicitly. To prepare for this change, S3A should warn when a region is not set via fs.s3a.endpoint.region. We should warn even if fs.s3a.endpoint is set and region can be parsed from this. This is because it is recommended to let the SDK V2 figure out the endpoint to use from the region, and so S3A should discourage from setting the endpoint unless absolutely required (eg for third party stores). > Warn when no region is configured > - > > Key: HADOOP-18579 > URL: https://issues.apache.org/jira/browse/HADOOP-18579 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ahmar Suhail >Priority: Minor > > The AWS Java SDK V1 allows for cross region access. This means that even if > you instantiate the S3 client with US_EAST_1 (or any region different to your > actual bucket's region), the SDK will figure out the region. > > With the upgrade to SDK V2, this is no longer supported and the region should > be set explicitly. Requests with the incorrect region will fail. To prepare > for this change, S3A should warn when a region is not set via > fs.s3a.endpoint.region. > > We should warn even if fs.s3a.endpoint is set and region can be parsed from > this. This is because it is recommended to let the SDK V2 figure out the > endpoint to use from the region, and so S3A should discourage from setting > the endpoint unless absolutely required (eg for third party stores). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18579) Warn when no region is configured
Ahmar Suhail created HADOOP-18579: - Summary: Warn when no region is configured Key: HADOOP-18579 URL: https://issues.apache.org/jira/browse/HADOOP-18579 Project: Hadoop Common Issue Type: Sub-task Reporter: Ahmar Suhail The AWS Java SDK V1 allows for cross region access. This means that even if you have instantiate the S3 client with US_EAST_1 (or any region different to your actual bucket's region), the SDK will figure out the region. With the upgrade to SDK V2, this is no longer supported and the region should be set explicitly. To prepare for this change, S3A should warn when a region is not set via fs.s3a.endpoint.region. We should warn even if fs.s3a.endpoint is set and region can be parsed from this. This is because it is recommended to let the SDK V2 figure out the endpoint to use from the region, and so S3A should discourage from setting the endpoint unless absolutely required (eg for third party stores). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18572) AWS SDK V2 - Fix failing tests
[ https://issues.apache.org/jira/browse/HADOOP-18572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18572: -- Affects Version/s: 3.4.0 > AWS SDK V2 - Fix failing tests > -- > > Key: HADOOP-18572 > URL: https://issues.apache.org/jira/browse/HADOOP-18572 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > Labels: pull-request-available > > We have a few failing tests for various reasons. Some are dependent on the > TM, but others can be looked into and fixed. > |TestS3AExceptionTranslation|test301ContainsEndpoint|Missing endpoint in SDK > exception > ([aws/aws-sdk-java-v2#3048|https://github.com/aws/aws-sdk-java-v2/issues/3048])| > |TestStreamChangeTracker|testCopyETagRequired, > testCopyVersionIdRequired|Transfer Manager response does not yet have > {{CopyObjectResult}}| > |ITestS3AFileContextStatistics|testStatistics|ProgressListeners not attached > to non-TM uploads| > |ITestS3AEncryptionSSEC|multiple tests (14 out of 24)|Transfer Manager issue > with SSE-C| > |ITestXAttrCost|testXAttrRoot.|{{headObject()}} with empty key fails| > |ITestSessionDelegationInFileystem|testDelegatedFileSystem|Succeeds, but > {{headObject()}} with empty key commented out| > |ITestS3ACannedACLs|testCreatedObjectsHaveACLs|AWSCannedACL.LogDeliveryWrite > not supported in SDK v2| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18572) AWS SDK V2 - Fix failing tests
[ https://issues.apache.org/jira/browse/HADOOP-18572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18572: -- Component/s: fs/s3 > AWS SDK V2 - Fix failing tests > -- > > Key: HADOOP-18572 > URL: https://issues.apache.org/jira/browse/HADOOP-18572 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > Labels: pull-request-available > > We have a few failing tests for various reasons. Some are dependent on the > TM, but others can be looked into and fixed. > |TestS3AExceptionTranslation|test301ContainsEndpoint|Missing endpoint in SDK > exception > ([aws/aws-sdk-java-v2#3048|https://github.com/aws/aws-sdk-java-v2/issues/3048])| > |TestStreamChangeTracker|testCopyETagRequired, > testCopyVersionIdRequired|Transfer Manager response does not yet have > {{CopyObjectResult}}| > |ITestS3AFileContextStatistics|testStatistics|ProgressListeners not attached > to non-TM uploads| > |ITestS3AEncryptionSSEC|multiple tests (14 out of 24)|Transfer Manager issue > with SSE-C| > |ITestXAttrCost|testXAttrRoot.|{{headObject()}} with empty key fails| > |ITestSessionDelegationInFileystem|testDelegatedFileSystem|Succeeds, but > {{headObject()}} with empty key commented out| > |ITestS3ACannedACLs|testCreatedObjectsHaveACLs|AWSCannedACL.LogDeliveryWrite > not supported in SDK v2| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18572) AWS SDK V2 - Fix failing tests
[ https://issues.apache.org/jira/browse/HADOOP-18572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18572: -- Summary: AWS SDK V2 - Fix failing tests (was: Fix failing tests) > AWS SDK V2 - Fix failing tests > -- > > Key: HADOOP-18572 > URL: https://issues.apache.org/jira/browse/HADOOP-18572 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ahmar Suhail >Priority: Major > Labels: pull-request-available > > We have a few failing tests for various reasons. Some are dependent on the > TM, but others can be looked into and fixed. > |TestS3AExceptionTranslation|test301ContainsEndpoint|Missing endpoint in SDK > exception > ([aws/aws-sdk-java-v2#3048|https://github.com/aws/aws-sdk-java-v2/issues/3048])| > |TestStreamChangeTracker|testCopyETagRequired, > testCopyVersionIdRequired|Transfer Manager response does not yet have > {{CopyObjectResult}}| > |ITestS3AFileContextStatistics|testStatistics|ProgressListeners not attached > to non-TM uploads| > |ITestS3AEncryptionSSEC|multiple tests (14 out of 24)|Transfer Manager issue > with SSE-C| > |ITestXAttrCost|testXAttrRoot.|{{headObject()}} with empty key fails| > |ITestSessionDelegationInFileystem|testDelegatedFileSystem|Succeeds, but > {{headObject()}} with empty key commented out| > |ITestS3ACannedACLs|testCreatedObjectsHaveACLs|AWSCannedACL.LogDeliveryWrite > not supported in SDK v2| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18571) AWS SDK V2 - Qualify the upgrade.
[ https://issues.apache.org/jira/browse/HADOOP-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18571: -- Summary: AWS SDK V2 - Qualify the upgrade. (was: Qualify the upgrade. ) > AWS SDK V2 - Qualify the upgrade. > -- > > Key: HADOOP-18571 > URL: https://issues.apache.org/jira/browse/HADOOP-18571 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ahmar Suhail >Priority: Major > > Run tests as per [qualifying aws ask > update|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md#-qualifying-an-aws-sdk-update] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18571) AWS SDK V2 - Qualify the upgrade.
[ https://issues.apache.org/jira/browse/HADOOP-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18571: -- Component/s: fs/s3 > AWS SDK V2 - Qualify the upgrade. > -- > > Key: HADOOP-18571 > URL: https://issues.apache.org/jira/browse/HADOOP-18571 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > Run tests as per [qualifying aws ask > update|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md#-qualifying-an-aws-sdk-update] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18571) AWS SDK V2 - Qualify the upgrade.
[ https://issues.apache.org/jira/browse/HADOOP-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18571: -- Affects Version/s: 3.4.0 > AWS SDK V2 - Qualify the upgrade. > -- > > Key: HADOOP-18571 > URL: https://issues.apache.org/jira/browse/HADOOP-18571 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > Run tests as per [qualifying aws ask > update|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md#-qualifying-an-aws-sdk-update] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18565) AWS SDK V2 - Add in support of custom signers
[ https://issues.apache.org/jira/browse/HADOOP-18565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18565: -- Affects Version/s: 3.4.0 > AWS SDK V2 - Add in support of custom signers > - > > Key: HADOOP-18565 > URL: https://issues.apache.org/jira/browse/HADOOP-18565 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > S3A allows users configure to custom signers, add in support for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18570) AWS SDK V2 - Update region logic
[ https://issues.apache.org/jira/browse/HADOOP-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18570: -- Summary: AWS SDK V2 - Update region logic (was: Update region logic) > AWS SDK V2 - Update region logic > > > Key: HADOOP-18570 > URL: https://issues.apache.org/jira/browse/HADOOP-18570 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > SDK V2 will no longer resolve a buckets region if it is not set when > initialising the client. > > Current logic will always make a head bucket call on FS initialisation. We > should review this. Possible solution: > * Warn if region is not set. > * If no region, try and resolve. If resolution fails, throw an exception. > Cache the region to optimise for short lived FS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18565) AWS SDK V2 - Add in support of custom signers
[ https://issues.apache.org/jira/browse/HADOOP-18565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18565: -- Summary: AWS SDK V2 - Add in support of custom signers (was: Add in support of custom signers) > AWS SDK V2 - Add in support of custom signers > - > > Key: HADOOP-18565 > URL: https://issues.apache.org/jira/browse/HADOOP-18565 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ahmar Suhail >Priority: Major > > S3A allows users configure to custom signers, add in support for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18565) AWS SDK V2 - Add in support of custom signers
[ https://issues.apache.org/jira/browse/HADOOP-18565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18565: -- Component/s: fs/s3 > AWS SDK V2 - Add in support of custom signers > - > > Key: HADOOP-18565 > URL: https://issues.apache.org/jira/browse/HADOOP-18565 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Ahmar Suhail >Priority: Major > > S3A allows users configure to custom signers, add in support for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18570) Update region logic
[ https://issues.apache.org/jira/browse/HADOOP-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18570: -- Component/s: fs/s3 > Update region logic > --- > > Key: HADOOP-18570 > URL: https://issues.apache.org/jira/browse/HADOOP-18570 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Ahmar Suhail >Priority: Major > > SDK V2 will no longer resolve a buckets region if it is not set when > initialising the client. > > Current logic will always make a head bucket call on FS initialisation. We > should review this. Possible solution: > * Warn if region is not set. > * If no region, try and resolve. If resolution fails, throw an exception. > Cache the region to optimise for short lived FS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18570) Update region logic
[ https://issues.apache.org/jira/browse/HADOOP-18570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmar Suhail updated HADOOP-18570: -- Affects Version/s: 3.4.0 > Update region logic > --- > > Key: HADOOP-18570 > URL: https://issues.apache.org/jira/browse/HADOOP-18570 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Ahmar Suhail >Priority: Major > > SDK V2 will no longer resolve a buckets region if it is not set when > initialising the client. > > Current logic will always make a head bucket call on FS initialisation. We > should review this. Possible solution: > * Warn if region is not set. > * If no region, try and resolve. If resolution fails, throw an exception. > Cache the region to optimise for short lived FS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18572) Fix failing tests
Ahmar Suhail created HADOOP-18572: - Summary: Fix failing tests Key: HADOOP-18572 URL: https://issues.apache.org/jira/browse/HADOOP-18572 Project: Hadoop Common Issue Type: Sub-task Reporter: Ahmar Suhail We have a few failing tests for various reasons. Some are dependent on the TM, but others can be looked into and fixed. |TestS3AExceptionTranslation|test301ContainsEndpoint|Missing endpoint in SDK exception ([aws/aws-sdk-java-v2#3048|https://github.com/aws/aws-sdk-java-v2/issues/3048])| |TestStreamChangeTracker|testCopyETagRequired, testCopyVersionIdRequired|Transfer Manager response does not yet have {{CopyObjectResult}}| |ITestS3AFileContextStatistics|testStatistics|ProgressListeners not attached to non-TM uploads| |ITestS3AEncryptionSSEC|multiple tests (14 out of 24)|Transfer Manager issue with SSE-C| |ITestXAttrCost|testXAttrRoot.|{{headObject()}} with empty key fails| |ITestSessionDelegationInFileystem|testDelegatedFileSystem|Succeeds, but {{headObject()}} with empty key commented out| |ITestS3ACannedACLs|testCreatedObjectsHaveACLs|AWSCannedACL.LogDeliveryWrite not supported in SDK v2| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18073) Upgrade AWS SDK to v2
[ https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17647002#comment-17647002 ] Ahmar Suhail commented on HADOOP-18073: --- Thanks [~mthakur] , I've run the test and all ok. For the refactoring of that method, i'd prefer to do it as a separate PR. If all looks good to you, could you/[~ste...@apache.org] please push this rebased branch up to the feature branch? > Upgrade AWS SDK to v2 > - > > Key: HADOOP-18073 > URL: https://issues.apache.org/jira/browse/HADOOP-18073 > Project: Hadoop Common > Issue Type: Task > Components: auth, fs/s3 >Affects Versions: 3.3.1 >Reporter: xiaowei sun >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Attachments: Upgrading S3A to SDKV2.pdf > > > This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java > V1 to AWS SDK for Java V2. > Original use case: > {quote}We would like to access s3 with AWS SSO, which is supported in > software.amazon.awssdk:sdk-core:2.*. > In particular, from > [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html], > when to set 'fs.s3a.aws.credentials.provider', it must be > "com.amazonaws.auth.AWSCredentialsProvider". We would like to support > "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which > supports AWS SSO, so users only need to authenticate once. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18571) Qualify the upgrade.
Ahmar Suhail created HADOOP-18571: - Summary: Qualify the upgrade. Key: HADOOP-18571 URL: https://issues.apache.org/jira/browse/HADOOP-18571 Project: Hadoop Common Issue Type: Sub-task Reporter: Ahmar Suhail Run tests as per [qualifying aws ask update|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md#-qualifying-an-aws-sdk-update] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18570) Update region logic
Ahmar Suhail created HADOOP-18570: - Summary: Update region logic Key: HADOOP-18570 URL: https://issues.apache.org/jira/browse/HADOOP-18570 Project: Hadoop Common Issue Type: Sub-task Reporter: Ahmar Suhail SDK V2 will no longer resolve a buckets region if it is not set when initialising the client. Current logic will always make a head bucket call on FS initialisation. We should review this. Possible solution: * Warn if region is not set. * If no region, try and resolve. If resolution fails, throw an exception. Cache the region to optimise for short lived FS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-18073) Upgrade AWS SDK to v2
[ https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645348#comment-17645348 ] Ahmar Suhail edited comment on HADOOP-18073 at 12/9/22 3:37 PM: [~ste...@apache.org] I've rebased our branch, [here|https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade] The only notable conflict was in S3AInputStream with new code that was added for https://issues.apache.org/jira/browse/HADOOP-18460. We've made a couple of small changes to fix conflict [here|https://github.com/ahmarsuhail/hadoop/pull/35/files#diff-f84380cfce48a9682320d596f593808fe16d81a71dcc5cfcf10842f932d0ff13R1030]. [~mthakur] could you check if this look ok? and if there's anything else we should do to verify that this issue does not resurface was (Author: JIRAUSER283484): [~ste...@apache.org] I've rebased our branch, [https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade |https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade.]The only notable conflict was in S3AInputStream with new code that was added for https://issues.apache.org/jira/browse/HADOOP-18460. We've made a couple of small changes to fix conflict [here|https://github.com/ahmarsuhail/hadoop/pull/35/files#diff-f84380cfce48a9682320d596f593808fe16d81a71dcc5cfcf10842f932d0ff13R1030]. [~mthakur] could you check if this look ok? and if there's anything else we should do to verify that this issue does not resurface > Upgrade AWS SDK to v2 > - > > Key: HADOOP-18073 > URL: https://issues.apache.org/jira/browse/HADOOP-18073 > Project: Hadoop Common > Issue Type: Task > Components: auth, fs/s3 >Affects Versions: 3.3.1 >Reporter: xiaowei sun >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Attachments: Upgrading S3A to SDKV2.pdf > > > This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java > V1 to AWS SDK for Java V2. > Original use case: > {quote}We would like to access s3 with AWS SSO, which is supported in > software.amazon.awssdk:sdk-core:2.*. > In particular, from > [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html], > when to set 'fs.s3a.aws.credentials.provider', it must be > "com.amazonaws.auth.AWSCredentialsProvider". We would like to support > "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which > supports AWS SSO, so users only need to authenticate once. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-18073) Upgrade AWS SDK to v2
[ https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17645348#comment-17645348 ] Ahmar Suhail edited comment on HADOOP-18073 at 12/9/22 3:36 PM: [~ste...@apache.org] I've rebased our branch, [https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade |https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade.]The only notable conflict was in S3AInputStream with new code that was added for https://issues.apache.org/jira/browse/HADOOP-18460. We've made a couple of small changes to fix conflict [here|https://github.com/ahmarsuhail/hadoop/pull/35/files#diff-f84380cfce48a9682320d596f593808fe16d81a71dcc5cfcf10842f932d0ff13R1030]. [~mthakur] could you check if this look ok? and if there's anything else we should do to verify that this issue does not resurface was (Author: JIRAUSER283484): [~ste...@apache.org] I've rebased our branch, [https://github.com/ahmarsuhail/hadoop/tree/feature-HADOOP-18073-s3a-sdk-upgrade.] The only notable conflict was in S3AInputStream with new code that was added for https://issues.apache.org/jira/browse/HADOOP-18460. We've made a couple of small changes to fix conflict [here|https://github.com/ahmarsuhail/hadoop/pull/35/files#diff-f84380cfce48a9682320d596f593808fe16d81a71dcc5cfcf10842f932d0ff13R1030]. [~mthakur] could you check if this look ok? and if there's anything else we should do to verify that this issue does not resurface > Upgrade AWS SDK to v2 > - > > Key: HADOOP-18073 > URL: https://issues.apache.org/jira/browse/HADOOP-18073 > Project: Hadoop Common > Issue Type: Task > Components: auth, fs/s3 >Affects Versions: 3.3.1 >Reporter: xiaowei sun >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Attachments: Upgrading S3A to SDKV2.pdf > > > This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java > V1 to AWS SDK for Java V2. > Original use case: > {quote}We would like to access s3 with AWS SSO, which is supported in > software.amazon.awssdk:sdk-core:2.*. > In particular, from > [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html], > when to set 'fs.s3a.aws.credentials.provider', it must be > "com.amazonaws.auth.AWSCredentialsProvider". We would like to support > "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which > supports AWS SSO, so users only need to authenticate once. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18565) Add in support of custom signers
Ahmar Suhail created HADOOP-18565: - Summary: Add in support of custom signers Key: HADOOP-18565 URL: https://issues.apache.org/jira/browse/HADOOP-18565 Project: Hadoop Common Issue Type: Sub-task Reporter: Ahmar Suhail S3A allows users configure to custom signers, add in support for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org