[jira] [Resolved] (HADOOP-18962) Upgrade kafka to 3.4.0

2024-05-24 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18962.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Upgrade kafka to 3.4.0
> --
>
> Key: HADOOP-18962
> URL: https://issues.apache.org/jira/browse/HADOOP-18962
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Upgrade kafka-clients to 3.4.0 to fix 
> https://nvd.nist.gov/vuln/detail/CVE-2023-25194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19168) Upgrade Kafka Clients due to CVEs

2024-05-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19168.
-
Resolution: Duplicate

rohit, dupe of HADOOP-18962. let's focus on that

> Upgrade Kafka Clients due to CVEs
> -
>
> Key: HADOOP-19168
> URL: https://issues.apache.org/jira/browse/HADOOP-19168
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Rohit Kumar
>Priority: Major
>  Labels: pull-request-available
>
> Upgrade Kafka Clients due to CVEs
> CVE-2023-25194:- Affected versions of this package are vulnerable to 
> Deserialization of Untrusted Data when there are gadgets in the 
> {{{}classpath{}}}. The server will connect to the attacker's LDAP server and 
> deserialize the LDAP response, which the attacker can use to execute java 
> deserialization gadget chains on the Kafka connect server.
> CVSS Score:- 8.8(High)
> [https://nvd.nist.gov/vuln/detail/CVE-2023-25194] 
> CVE-2021-38153
> CVE-2018-17196
> Insufficient Entropy
> [https://security.snyk.io/package/maven/org.apache.kafka:kafka-clients] 
> Upgrade Kafka-Clients to 3.4.0 or higher.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19182) Upgrade kafka to 3.4.0

2024-05-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19182.
-
Resolution: Duplicate

> Upgrade kafka to 3.4.0
> --
>
> Key: HADOOP-19182
> URL: https://issues.apache.org/jira/browse/HADOOP-19182
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Reporter: fuchaohong
>Priority: Major
>  Labels: pull-request-available
>
> Upgrade kafka to 3.4.0 to resolve CVE-2023-25194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19185) Improve ABFS metric integration with iOStatistics

2024-05-23 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19185:
---

 Summary: Improve ABFS metric integration with iOStatistics
 Key: HADOOP-19185
 URL: https://issues.apache.org/jira/browse/HADOOP-19185
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Steve Loughran


Followup to HADOOP-18325 covering the outstanding comments of

https://github.com/apache/hadoop/pull/6314/files





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18325) ABFS: Add correlated metric support for ABFS operations

2024-05-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18325.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> ABFS: Add correlated metric support for ABFS operations
> ---
>
> Key: HADOOP-18325
> URL: https://issues.apache.org/jira/browse/HADOOP-18325
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.3
>Reporter: Anmol Asrani
>Assignee: Anmol Asrani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Add metrics related to a particular job, specific to number of total 
> requests, retried requests, retry count and others



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19163) Upgrade protobuf version to 3.25.3

2024-05-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19163.
-
Resolution: Fixed

done. not sure what version to tag with.

Proposed: we cut a new release of this

> Upgrade protobuf version to 3.25.3
> --
>
> Key: HADOOP-19163
> URL: https://issues.apache.org/jira/browse/HADOOP-19163
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hadoop-thirdparty
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19181) IAMCredentialsProvider throttle failures

2024-05-20 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19181:
---

 Summary: IAMCredentialsProvider throttle failures
 Key: HADOOP-19181
 URL: https://issues.apache.org/jira/browse/HADOOP-19181
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Tests report throttling errors in IAM being remapped to noauth and failure

Again, impala tests, but with multiple processes on same host. this means that 
HADOOP-18945 isn't sufficient as even if it ensures a singleton instance for a 
process
* it doesn't if there are many test buckets (fixable)
* it doesn't work across processes (not fixable)

we may be able to 
* use a singleton across all filesystem instances
* once we know how throttling is reported, handle it through retries + 
error/stats collection


{code}
2024-02-17T18:02:10,175  WARN [TThreadPoolServer WorkerProcess-22] 
fs.FileSystem: Failed to initialize fileystem 
s3a://impala-test-uswest2-1/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels:
 java.nio.file.AccessDeniedException: impala-test-uswest2-1: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] 
utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException 
impala-test-uswest2-1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No 
AWS Credentials provided by TemporaryAWSCredentialsProvider 
SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
java.nio.file.AccessDeniedException: impala-test-uswest2-1: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2748)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:970)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.doBucketProbing(S3AFileSystem.java:859) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:715) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 

[jira] [Resolved] (HADOOP-19172) Upgrade aws-java-sdk to 1.12.720

2024-05-16 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19172.
-
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
   Resolution: Fixed

> Upgrade aws-java-sdk to 1.12.720
> 
>
> Key: HADOOP-19172
> URL: https://issues.apache.org/jira/browse/HADOOP-19172
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, fs/s3
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> Update to the latest AWS SDK, to stop anyone worrying about the ion library 
> CVE https://nvd.nist.gov/vuln/detail/CVE-2024-21634
> This isn't exposed in the s3a client, but may be used downstream. 
> on v2 sdk releases, the v1 sdk is only used during builds; 3.3.x it is shipped



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19073) WASB: Fix connection leak in FolderRenamePending

2024-05-15 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19073.
-
Resolution: Fixed

> WASB: Fix connection leak in FolderRenamePending
> 
>
> Key: HADOOP-19073
> URL: https://issues.apache.org/jira/browse/HADOOP-19073
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.6
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Fix connection leak in FolderRenamePending in getting bytes  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19176) S3A Xattr headers need hdfs-compatible prefix

2024-05-15 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19176:
---

 Summary: S3A Xattr headers need hdfs-compatible prefix
 Key: HADOOP-19176
 URL: https://issues.apache.org/jira/browse/HADOOP-19176
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.6, 3.4.0
Reporter: Steve Loughran


x3a xattr list needs a prefix compatible with hdfs or existing code which tries 
to copy attributes between stores can break

we need a prefix of {user/trusted/security/system/raw}.

now, problem: currently xattrs are used by the magic committer to propagate 
file size progress; renaming the prefix will break existing code. But as it's 
read only we could modify spark to look for both old and new values.

{code}

org.apache.hadoop.HadoopIllegalArgumentException: An XAttr name must be 
prefixed with user/trusted/security/system/raw, followed by a '.'
at org.apache.hadoop.hdfs.XAttrHelper.buildXAttr(XAttrHelper.java:77) 
at org.apache.hadoop.hdfs.DFSClient.setXAttr(DFSClient.java:2835) 
at 
org.apache.hadoop.hdfs.DistributedFileSystem$59.doCall(DistributedFileSystem.java:3106)
 
at 
org.apache.hadoop.hdfs.DistributedFileSystem$59.doCall(DistributedFileSystem.java:3102)
 
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.setXAttr(DistributedFileSystem.java:3115)
 
at org.apache.hadoop.fs.FileSystem.setXAttr(FileSystem.java:3097)

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18958) Improve UserGroupInformation debug log

2024-05-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18958.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

>  Improve UserGroupInformation debug log
> ---
>
> Key: HADOOP-18958
> URL: https://issues.apache.org/jira/browse/HADOOP-18958
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.3.0, 3.3.5
>Reporter: wangzhihui
>Assignee: wangzhihui
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: 20231029-122825-1.jpeg, 20231029-122825.jpeg, 
> 20231030-143525.jpeg, image-2023-10-29-09-47-56-489.png, 
> image-2023-10-30-14-35-11-161.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>       Using “new Exception( )” to print the call stack of "doAs Method " in 
> the UserGroupInformation class. Using this way will print meaningless 
> Exception information and too many call stacks, This is not conducive to 
> troubleshooting
> *example:*
> !20231029-122825.jpeg|width=991,height=548!
>  
> *improved result* :
>  
> !image-2023-10-29-09-47-56-489.png|width=1099,height=156!
> !20231030-143525.jpeg|width=572,height=674!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-18958) UserGroupInformation debug log improve

2024-05-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-18958:
-

> UserGroupInformation debug log improve
> --
>
> Key: HADOOP-18958
> URL: https://issues.apache.org/jira/browse/HADOOP-18958
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.3.0, 3.3.5
>Reporter: wangzhihui
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 20231029-122825-1.jpeg, 20231029-122825.jpeg, 
> 20231030-143525.jpeg, image-2023-10-29-09-47-56-489.png, 
> image-2023-10-30-14-35-11-161.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>       Using “new Exception( )” to print the call stack of "doAs Method " in 
> the UserGroupInformation class. Using this way will print meaningless 
> Exception information and too many call stacks, This is not conducive to 
> troubleshooting
> *example:*
> !20231029-122825.jpeg|width=991,height=548!
>  
> *improved result* :
>  
> !image-2023-10-29-09-47-56-489.png|width=1099,height=156!
> !20231030-143525.jpeg|width=572,height=674!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19175) update s3a committer docs

2024-05-14 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19175:
---

 Summary: update s3a committer docs
 Key: HADOOP-19175
 URL: https://issues.apache.org/jira/browse/HADOOP-19175
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Update s3a committer docs

* declare that magic committer is stable and make it the recommended one
* show how to use new command "mapred successfile" to print the success file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19172) Upgrade aws-java-sdk to 1.12.720

2024-05-13 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19172:
---

 Summary: Upgrade aws-java-sdk to 1.12.720
 Key: HADOOP-19172
 URL: https://issues.apache.org/jira/browse/HADOOP-19172
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build, fs/s3
Affects Versions: 3.3.6, 3.4.0
Reporter: Steve Loughran


Update to the latest AWS SDK, to stop anyone worrying about the ion library CVE 
https://nvd.nist.gov/vuln/detail/CVE-2024-21634

This isn't exposed in the s3a client, but may be used downstream. 

on v2 sdk releases, the v1 sdk is only used during builds; 3.3.x it is shipped



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19171) AWS v2: handle alternative forms of connection failure

2024-05-13 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19171:
---

 Summary: AWS v2: handle alternative forms of connection failure
 Key: HADOOP-19171
 URL: https://issues.apache.org/jira/browse/HADOOP-19171
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.6, 3.4.0
Reporter: Steve Loughran


We've had reports of network connection failures surfacing deeper in the stack 
where we don't convert to AWSApiCallTimeoutException so they aren't retried 
properly (retire connection and repeat)


{code}
Unable to execute HTTP request: Broken pipe (Write failed)
{code}


{code}
 Your socket connection to the server was not read from or written to within 
the timeout period. Idle connections will be closed. (Service: Amazon S3; 
Status Code: 400; Error Code: RequestTimeout
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19161) S3A: support a comma separated list of performance flags

2024-05-02 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19161:
---

 Summary: S3A: support a comma separated list of performance flags
 Key: HADOOP-19161
 URL: https://issues.apache.org/jira/browse/HADOOP-19161
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 3.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran


HADOOP-19072 shows we want to add more optimisations than that of HADOOP-18930.

* Extending the new optimisations to the existing option is brittle
* Adding explicit options for each feature gets complext fast.

Proposed
* A new class S3APerformanceFlags keeps all the flags
* it build this from a string[] of values, which can be extracted from 
getConf(),
* and it can also support a "*" option to mean "everything"
* this class can also be handed off to hasPathCapability() and do the right 
thing.

Proposed optimisations
* create file (we will hook up HADOOP-18930)
* mkdir (HADOOP-19072)
* delete (probe for parent path)
* rename (probe for source path)

We could think of more, with different names, later.
The goal is make it possible to strip out every HTTP request we do for 
safety/posix compliance, so applications have the option of turning off what 
they don't need.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19146) noaa-cors-pds bucket access with global endpoint fails

2024-04-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19146.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> noaa-cors-pds bucket access with global endpoint fails
> --
>
> Key: HADOOP-19146
> URL: https://issues.apache.org/jira/browse/HADOOP-19146
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> All tests accessing noaa-cors-pds use us-east-1 region, as configured at 
> bucket level. If global endpoint is configured (e.g. us-west-2), they fail to 
> access to bucket.
>  
> Sample error:
> {code:java}
> org.apache.hadoop.fs.s3a.AWSRedirectException: Received permanent redirect 
> response to region [us-east-1].  This likely indicates that the S3 region 
> configured in fs.s3a.endpoint.region does not match the AWS region containing 
> the bucket.: null (Service: S3, Status Code: 301, Request ID: 
> PMRWMQC9S91CNEJR, Extended Request ID: 
> 6Xrg9thLiZXffBM9rbSCRgBqwTxdLAzm6OzWk9qYJz1kGex3TVfdiMtqJ+G4vaYCyjkqL8cteKI/NuPBQu5A0Q==)
>     at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:253)
>     at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:155)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4041)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3947)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getFileStatus$26(S3AFileSystem.java:3924)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2716)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2735)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3922)
>     at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:115)
>     at org.apache.hadoop.fs.Globber.doGlob(Globber.java:349)
>     at org.apache.hadoop.fs.Globber.glob(Globber.java:202)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$globStatus$35(S3AFileSystem.java:4956)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2716)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2735)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.globStatus(S3AFileSystem.java:4949)
>     at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:313)
>     at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:281)
>     at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:445)
>     at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:311)
>     at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:328)
>     at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:201)
>     at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1677)
>     at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1674)
>  {code}
> {code:java}
> Caused by: software.amazon.awssdk.services.s3.model.S3Exception: null 
> (Service: S3, Status Code: 301, Request ID: PMRWMQC9S91CNEJR, Extended 
> Request ID: 
> 6Xrg9thLiZXffBM9rbSCRgBqwTxdLAzm6OzWk9qYJz1kGex3TVfdiMtqJ+G4vaYCyjkqL8cteKI/NuPBQu5A0Q==)
>     at 
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
>     at 
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
>     at 
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
>     at 

[jira] [Resolved] (HADOOP-19159) Fix hadoop-aws document for fs.s3a.committer.abort.pending.uploads

2024-04-29 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19159.
-
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
   Resolution: Fixed

> Fix hadoop-aws document for fs.s3a.committer.abort.pending.uploads
> --
>
> Key: HADOOP-19159
> URL: https://issues.apache.org/jira/browse/HADOOP-19159
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Xi Chen
>Assignee: Xi Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> The description about `fs.s3a.committer.abort.pending.uploads` in the 
> _Concurrent Jobs writing to the same destination_ is not all correct.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19158) Support delegating ByteBufferPositionedReadable to vector reads

2024-04-25 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19158:
---

 Summary: Support delegating ByteBufferPositionedReadable to vector 
reads
 Key: HADOOP-19158
 URL: https://issues.apache.org/jira/browse/HADOOP-19158
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Make it easy for any stream with vector io to suppor

Specifically, 

ByteBufferPositionedReadable.readFully()

is exactly a single range read so is easy to read.

the simpler read() call which can return less isn't part of the vector API.
Proposed: invoke the readFully() but convert an EOFException to -1 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19157) [ABFS] Filesystem contract tests to use methodPath for robust parallel test runs

2024-04-23 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19157:
---

 Summary: [ABFS] Filesystem contract tests to use methodPath for 
robust parallel test runs
 Key: HADOOP-19157
 URL: https://issues.apache.org/jira/browse/HADOOP-19157
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure, test
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


hadoop-azure supports parallel test runs, but unlike hadoop-aws, the azure ones 
are parallelised across methods in the same test suites.

this can fail badly where contract tests have hard coded filenames and assume 
that they can use this across all test cases. Shows up when you are testing on 
a store with reduced IO capacity triggering retries and making some test cases 
slower

Fix: hadoop-common contract tests to use methodPath() names



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

2024-04-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19102.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
> --
>
> Key: HADOOP-19102
> URL: https://issues.apache.org/jira/browse/HADOOP-19102
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
> If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
> read more data than the buffer array can hold, which causes an exception.
> Change: To avoid this, we will keep footerBufferSize = 
> min(readBufferSizeConfig, footerBufferSizeConfig)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19153) hadoop-common still exports logback as a transitive dependency

2024-04-17 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19153:
---

 Summary: hadoop-common still exports logback as a transitive 
dependency
 Key: HADOOP-19153
 URL: https://issues.apache.org/jira/browse/HADOOP-19153
 Project: Hadoop Common
  Issue Type: Bug
  Components: build, common
Affects Versions: 3.4.0
Reporter: Steve Loughran


Even though HADOOP-19084 set out to stop it, somehow ZK's declaration of a 
logback dependency is still contaminating the hadoop-common dependency graph, 
so causing problems downstream.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19079) HttpExceptionUtils to check that loaded class is really an exception before instantiation

2024-04-11 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19079.
-
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
   Resolution: Fixed

> HttpExceptionUtils to check that loaded class is really an exception before 
> instantiation
> -
>
> Key: HADOOP-19079
> URL: https://issues.apache.org/jira/browse/HADOOP-19079
> Project: Hadoop Common
>  Issue Type: Task
>  Components: common, security
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> It can be dangerous taking class names as inputs from HTTP messages even if 
> we control the source. Issue is in HttpExceptionUtils in hadoop-common 
> (validateResponse method).
> I can provide a PR that will highlight the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19096) [ABFS] Enhancing Client-Side Throttling Metrics Updation Logic

2024-04-11 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19096.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> [ABFS] Enhancing Client-Side Throttling Metrics Updation Logic
> --
>
> Key: HADOOP-19096
> URL: https://issues.apache.org/jira/browse/HADOOP-19096
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.1
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> ABFS has a client-side throttling mechanism which works on the metrics 
> collected from past requests made. I requests are getting failed due to 
> throttling at server, we update our metrics and client side backoff is 
> calculated based on those metrics.
> This PR enhances the logic to decide which requests should be considered to 
> compute client side backoff interval as follows:
> For each request made by ABFS driver, we will determine if they should 
> contribute to Client-Side Throttling based on the status code and result:
>  # Status code in 2xx range: Successful Operations should contribute.
>  # Status code in 3xx range: Redirection Operations should not contribute.
>  # Status code in 4xx range: User Errors should not contribute.
>  # Status code is 503: Throttling Error should contribute only if they are 
> due to client limits breach as follows:
>  ## 503, Ingress Over Account Limit: Should Contribute
>  ## 503, Egress Over Account Limit: Should Contribute
>  ## 503, TPS Over Account Limit: Should Contribute
>  ## 503, Other Server Throttling: Should not Contribute.
>  # Status code in 5xx range other than 503: Should not Contribute.
>  # IOException and UnknownHostExceptions: Should not Contribute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19098) Vector IO: consistent specified rejection of overlapping ranges

2024-04-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19098.
-
Resolution: Fixed

> Vector IO: consistent specified rejection of overlapping ranges
> ---
>
> Key: HADOOP-19098
> URL: https://issues.apache.org/jira/browse/HADOOP-19098
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> Related to PARQUET-2171 q: "how do you deal with overlapping ranges?"
> I believe s3a rejects this, but the other impls may not.
> Proposed
> FS spec to say 
> * "overlap triggers IllegalArgumentException". 
> * special case: 0 byte ranges may be short circuited to return empty buffer 
> even without checking file length etc.
> Contract tests to validate this
> (+ common helper code to do this).
> I'll copy the validation stuff into the parquet PR for consistency with older 
> releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19101) Vectored Read into off-heap buffer broken in fallback implementation

2024-04-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19101.
-
Fix Version/s: 3.3.9
   3.4.1
   Resolution: Fixed

> Vectored Read into off-heap buffer broken in fallback implementation
> 
>
> Key: HADOOP-19101
> URL: https://issues.apache.org/jira/browse/HADOOP-19101
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at 
> position zero even when the range is at a different offset. As a result: you 
> can get incorrect information.
> Thanks for this is straightforward: we pass in a FileRange and use its offset 
> as the starting position.
> However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely 
> read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we 
> have never seen this in production because the parquet and ORC libraries both 
> read into on-heap storage.
> Those libraries needs to be audited to make sure that they never attempt to 
> read into off-heap DirectBuffers. This is a bit trickier than you would think 
> because an allocator is passed in. For PARQUET-2171 we will 
> * only invoke the API on streams which explicitly declare their support for 
> the API (so fallback in parquet itself)
> * not invoke when direct buffer allocation is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19144) S3A prefetching to support Vector IO

2024-04-04 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19144:
---

 Summary: S3A prefetching to support Vector IO
 Key: HADOOP-19144
 URL: https://issues.apache.org/jira/browse/HADOOP-19144
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Add explicit support for vector IO in s3a prefetching stream.

* if a range is in 1+ cached block, it SHALL be read from cache and returned
* if a range is not in cache : TBD
* If a range is partially in cache: TBD

these are the same decisions that abfs has to make: should the client 
fetch/cache block or just do one or more GET requests

A big issue is: does caching of data fetched in a range request make any sense 
at all? Or more specifically: does fetching the blocks in which range requests 
are found make sense

Simply going to the store is a lot simpler



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common

2024-04-03 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19140:
---

 Summary: [ABFS, S3A] Add IORateLimiter api to hadoop common
 Key: HADOOP-19140
 URL: https://issues.apache.org/jira/browse/HADOOP-19140
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/azure, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Create a rate limiter API in hadoop common which code (initially, manifest 
committer, bulk delete).. can request iO capacity for a specific operation.

this can be exported by filesystems so support shared rate limiting across all 
threads

pulled from HADOOP-19093 PR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19115) upgrade to nimbus-jose-jwt 9.37.2 due to CVE

2024-04-02 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19115.
-
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
 Assignee: PJ Fanning
   Resolution: Fixed

> upgrade to nimbus-jose-jwt 9.37.2 due to CVE
> 
>
> Key: HADOOP-19115
> URL: https://issues.apache.org/jira/browse/HADOOP-19115
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, CVE
>Affects Versions: 3.4.0, 3.5.0
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> https://github.com/advisories/GHSA-gvpg-vgmx-xg6w



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19131) Assist reflection iO with WrappedOperations class

2024-03-28 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19131:
---

 Summary: Assist reflection iO with WrappedOperations class
 Key: HADOOP-19131
 URL: https://issues.apache.org/jira/browse/HADOOP-19131
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/azure, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


parquet, avro etc are still stuck building with older hadoop releases. 

This makes using new APIs hard (PARQUET-2117) and means that APIs which are 5 
years old (!) such as HADOOP-15229 just aren't picked up.

This lack of openFIle() adoption hurts working with files in cloud storage as
* extra HEAD requests are made
* read policies can't be explicitly set
* split start/end can't be passed down

Proposed
# create class org.apache.hadoop.io.WrappedOperations
# add methods to wrap the apis
# test in contract tests via reflection loading -verifies we have done it 
properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19047) Support InMemory Tracking Of S3A Magic Commits

2024-03-26 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19047.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Support InMemory Tracking Of S3A Magic Commits
> --
>
> Key: HADOOP-19047
> URL: https://issues.apache.org/jira/browse/HADOOP-19047
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The following are the operations which happens within a Task when it uses S3A 
> Magic Committer. 
> *During closing of stream*
> 1. A 0-byte file with a same name of the original file is uploaded to S3 
> using PUT operation. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152]
>  for more information. This is done so that the downstream application like 
> Spark could get the size of the file which is being written.
> 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176]
>  for more information.
> *During TaskCommit*
> 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number 
> of metadata file in S3 if a single task writes to 'x' files) are read and 
> rewritten to S3 as a single metadata file. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201]
>  for more information
> Since these operations happens with the Task JVM, We could optimize as well 
> as save cost by storing these information in memory when Task memory usage is 
> not a constraint. Hence the proposal here is to introduce a new MagicCommit 
> Tracker called "InMemoryMagicCommitTracker" which will store the 
> 1. Metadata of MPU in memory till the Task is committed
> 2. Store the size of the file which can be used by the downstream application 
> to get the file size before it is committed/visible to the output path.
> This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call 
> given a Task writes only 1 file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19116) update to zookeeper client 3.8.4 due to CVE-2024-23944

2024-03-25 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19116.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> update to zookeeper client 3.8.4 due to  CVE-2024-23944
> ---
>
> Key: HADOOP-19116
> URL: https://issues.apache.org/jira/browse/HADOOP-19116
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: CVE
>Affects Versions: 3.4.0, 3.3.6
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> https://github.com/advisories/GHSA-r978-9m6m-6gm6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19089) [ABFS] Reverting Back Support of setXAttr() and getXAttr() on root path

2024-03-25 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19089.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> [ABFS] Reverting Back Support of setXAttr() and getXAttr() on root path
> ---
>
> Key: HADOOP-19089
> URL: https://issues.apache.org/jira/browse/HADOOP-19089
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> A while back changes were made to support HDFS.setXAttr() and HDFS.getXAttr() 
> on root path for ABFS Driver.
> For these, filesystem level APIs were introduced and used to set/get metadata 
> of container.
> Refer to Jira: [HADOOP-18869] ABFS: Fixing Behavior of a File System APIs on 
> root path - ASF JIRA (apache.org)
> Ideally, same set of APIs should be used, and root should be treated as a 
> path like any other path.
> This change is to avoid calling container APIs for these HDFS calls.
> As a result of this these APIs will fail on root path (as earlier) because 
> service does not support get/set of user properties on root path.
> This change will also update the documentation to reflect that these 
> operations are not supported on root path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19122) testListPathWithValueGreaterThanServerMaximum assert failure on heavily loaded store

2024-03-22 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19122:
---

 Summary: testListPathWithValueGreaterThanServerMaximum assert 
failure on heavily loaded store
 Key: HADOOP-19122
 URL: https://issues.apache.org/jira/browse/HADOOP-19122
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Steve Loughran


on an azure store which may be experiencing throttling. the listPath call 
returns less than the 5K limit. the assertion needs to be changed for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19050) Add S3 Access Grants Support in S3A

2024-03-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19050.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Fixed in trunk; backport to 3.4 should go in later.


> Add S3 Access Grants Support in S3A
> ---
>
> Key: HADOOP-19050
> URL: https://issues.apache.org/jira/browse/HADOOP-19050
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Assignee: Jason Han
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Add support for S3 Access Grants 
> (https://aws.amazon.com/s3/features/access-grants/) in S3A.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19119) spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()

2024-03-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19119.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> spotbugs complaining about possible NPE in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()
> 
>
> Key: HADOOP-19119
> URL: https://issues.apache.org/jira/browse/HADOOP-19119
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: crypto
>Affects Versions: 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> PRs against hadoop-common are reporting spotbugs problems
> {code}
> Dodgy code Warnings
> Code  Warning
> NPPossible null pointer dereference in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return 
> value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details)
> In class org.apache.hadoop.crypto.key.kms.ValueQueue
> In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String)
> Local variable stored in JVM register ?
> Dereferenced at ValueQueue.java:[line 332]
> Known null at ValueQueue.java:[line 332]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19119) spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.alueQueue.getSize()

2024-03-19 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19119:
---

 Summary: spotbugs complaining about possible NPE in 
org.apache.hadoop.crypto.key.kms.ValueQueue.alueQueue.getSize()
 Key: HADOOP-19119
 URL: https://issues.apache.org/jira/browse/HADOOP-19119
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: crypto
Affects Versions: 3.5.0
Reporter: Steve Loughran
Assignee: Steve Loughran


PRs against hadoop-common are reporting spotbugs problems

{code}
Dodgy code Warnings
CodeWarning
NP  Possible null pointer dereference in 
org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value 
of called method
Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details)
In class org.apache.hadoop.crypto.key.kms.ValueQueue
In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String)
Local variable stored in JVM register ?
Dereferenced at ValueQueue.java:[line 332]
Known null at ValueQueue.java:[line 332]

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19066) AWS SDK V2 - Enabling FIPS should be allowed with central endpoint

2024-03-13 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19066.
-
Fix Version/s: 3.4.1
   Resolution: Fixed

> AWS SDK V2 - Enabling FIPS should be allowed with central endpoint
> --
>
> Key: HADOOP-19066
> URL: https://issues.apache.org/jira/browse/HADOOP-19066
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK 
> considers overriding endpoint and enabling fips as mutually exclusive, we 
> fail fast if fs.s3a.endpoint is set with fips support (details on 
> HADOOP-18975).
> Now, we no longer override SDK endpoint for central endpoint since we enable 
> cross region access (details on HADOOP-19044) but we would still fail fast if 
> endpoint is central and fips is enabled.
> Changes proposed:
>  * S3A to fail fast only if FIPS is enabled and non-central endpoint is 
> configured.
>  * Tests to ensure S3 bucket is accessible with default region us-east-2 with 
> cross region access (expected with central endpoint).
>  * Document FIPS support with central endpoint on connecting.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-19066) AWS SDK V2 - Enabling FIPS should be allowed with central endpoint

2024-03-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-19066:
-

> AWS SDK V2 - Enabling FIPS should be allowed with central endpoint
> --
>
> Key: HADOOP-19066
> URL: https://issues.apache.org/jira/browse/HADOOP-19066
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK 
> considers overriding endpoint and enabling fips as mutually exclusive, we 
> fail fast if fs.s3a.endpoint is set with fips support (details on 
> HADOOP-18975).
> Now, we no longer override SDK endpoint for central endpoint since we enable 
> cross region access (details on HADOOP-19044) but we would still fail fast if 
> endpoint is central and fips is enabled.
> Changes proposed:
>  * S3A to fail fast only if FIPS is enabled and non-central endpoint is 
> configured.
>  * Tests to ensure S3 bucket is accessible with default region us-east-2 with 
> cross region access (expected with central endpoint).
>  * Document FIPS support with central endpoint on connecting.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19108) S3 Express: document use

2024-03-12 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19108:
---

 Summary: S3 Express: document use
 Key: HADOOP-19108
 URL: https://issues.apache.org/jira/browse/HADOOP-19108
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


The 3.4.0 release doesn't explicitly cover S3 Express.

It's support is automatic
* library handles it
* hadoop shell commands know that there may be "missing" dirs in treewalks due 
to in-flight uploads
* s3afs automatically switches to deleting pending uploads in delete(dir) call.

we just need to provide a summary of features, how to probe etc.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19105) S3A: Recover from Vector IO read failures

2024-03-08 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19105:
---

 Summary: S3A: Recover from Vector IO read failures
 Key: HADOOP-19105
 URL: https://issues.apache.org/jira/browse/HADOOP-19105
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.6, 3.4.0
 Environment: s3a vector IO doesn't try to recover from read failures 
the way read() does.

Need to
* abort HTTP stream if considered needed
* retry active read which failed
* but not those which had succeeded


Reporter: Steve Loughran






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19043) S3A: Regression: ITestS3AOpenCost fails on prefetch test runs

2024-03-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19043.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> S3A: Regression: ITestS3AOpenCost fails on prefetch test runs
> -
>
> Key: HADOOP-19043
> URL: https://issues.apache.org/jira/browse/HADOOP-19043
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Getting test failures in the new ITestS3AOpenCost tests when run with 
> {{-Dprefetch}}
> Thought I'd tested this, but clearly not
> * class cast failures on asserts (fix: skip)
> * bytes read different in one test: (fix: identify and address)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19104) S3A HeaderProcessing to process all metadata entries of HEAD response

2024-03-07 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19104:
---

 Summary: S3A HeaderProcessing to process all metadata entries of 
HEAD response
 Key: HADOOP-19104
 URL: https://issues.apache.org/jira/browse/HADOOP-19104
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


S3A HeaderProcessing builds up an incomplete list of headers as its mapping of 
md to header. entries omits headers including
x-amz-server-side-encryption-aws-kms-key-id

proposed
* review all headers which are stripped from "raw" responses and mapped into 
headers
* make sure result of headers matches v1; looks like etags are different
* make sure x-amz-server-side-encryption-aws-kms-key-id gets back
* plus new checksum values

{code}
v1 sdk

{code}

# file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
header.Content-Length="524671"
header.Content-Type="binary/octet-stream"
header.ETag="3e39531220fbd3747d32cf93a79a7a0c"
header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
header.x-amz-server-side-encryption="AES256"

{code}

v2 SDK. note how etag is now double quoted.

{code}

# file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
header.Content-Length="524671"
header.Content-Type="binary/octet-stream"
header.ETag=""3e39531220fbd3747d32cf93a79a7a0c""
header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
header.x-amz-server-side-encryption="AES256"

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19097) core-default fs.s3a.connection.establish.timeout value too low -warning always printed

2024-03-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19097.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> core-default fs.s3a.connection.establish.timeout value too low -warning 
> always printed
> --
>
> Key: HADOOP-19097
> URL: https://issues.apache.org/jira/browse/HADOOP-19097
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> caused by HADOOP-18915.
> in core-default we set the value of fs.s3a.connection.establish.timeout to 5s
> {code}
> 
>   fs.s3a.connection.establish.timeout
>   5s
> 
> {code}
> but there is a minimum of 15s, so this prints a warning
> {code}
> 2024-02-29 10:39:27,369 WARN impl.ConfigurationHelper: Option 
> fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 
> ms instead
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19082) S3A: Update AWS SDK V2 to 2.24.6

2024-03-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19082.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> S3A: Update AWS SDK V2 to 2.24.6
> 
>
> Key: HADOOP-19082
> URL: https://issues.apache.org/jira/browse/HADOOP-19082
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Harshit Gupta
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Update the AWS SDK to 2.24.6 from 2.23.5 for latest updates in packaging 
> w.r.t. imds module.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19101) Vectored Read into off-heap buffer broken

2024-03-04 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19101:
---

 Summary: Vectored Read into off-heap buffer broken
 Key: HADOOP-19101
 URL: https://issues.apache.org/jira/browse/HADOOP-19101
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/azure
Affects Versions: 3.3.6, 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran



{{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at 
position zero even when the range is at a different offset. As a result: you 
can get incorrect information.

Thanks for this is straightforward: we pass in a FileRange and use its offset 
as the starting position.

However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely 
read vectorIO into direct buffers through HDFS, ABFS or Azure. Note that we 
have never seen this in production because the parquet and ORC libraries both 
read into on-heap storage.

Those libraries needs to be audited to make sure that they never attempt to 
read into off-heap DirectBuffers. This is a bit trickier than you would think 
because an allocator is passed in. For PARQUET-2171 we will 
* only invoke the API on streams which explicitly declare their support for the 
API (so fallback in parquet itself)
* not invoke when direct buffer allocation is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19098) Vector IO: consistent specified rejection of overlapping ranges

2024-03-01 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19098:
---

 Summary: Vector IO: consistent specified rejection of overlapping 
ranges
 Key: HADOOP-19098
 URL: https://issues.apache.org/jira/browse/HADOOP-19098
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs, fs/s3
Affects Versions: 3.3.6
Reporter: Steve Loughran
Assignee: Steve Loughran


Related to PARQUET-2171 q: "how do you deal with overlapping ranges?"

I believe s3a rejects this, but the other impls may not.

Proposed

FS spec to say 
* "overlap triggers IllegalArgumentException". 
* special case: 0 byte ranges may be short circuited to return empty buffer 
even without checking file length etc.

Contract tests to validate this

(+ common helper code to do this).

I'll copy the validation stuff into the parquet PR for consistency with older 
releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19097) core-default fs.s3a.connection.establish.timeout value too low -warning always printed

2024-02-29 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19097:
---

 Summary: core-default fs.s3a.connection.establish.timeout value 
too low -warning always printed
 Key: HADOOP-19097
 URL: https://issues.apache.org/jira/browse/HADOOP-19097
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


caused by HADOOP-18915.

in core-default we set the value of fs.s3a.connection.establish.timeout to 5s

{code}

  fs.s3a.connection.establish.timeout
  5s

{code}

but there is a minimum of 15s, so this prints a warning

{code}
2024-02-29 10:39:27,369 WARN impl.ConfigurationHelper: Option 
fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 ms 
instead
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19095) hadoop-aws: downgrade openssl export to test

2024-02-28 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19095:
---

 Summary: hadoop-aws: downgrade openssl export to test
 Key: HADOOP-19095
 URL: https://issues.apache.org/jira/browse/HADOOP-19095
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build, fs/s3
Affects Versions: 3.3.4, 3.3.3, 3.3.5, 3.3.2, 3.3.1, 3.3.0, 3.4.0
Reporter: Steve Loughran


As seen in dependency scans and mentioned in HADOOP-16346; wildfly/openssl jar 
is exported as runtime; it is only needed at test. proposed: downgrade



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19093) add load tests for abfs rename resilience

2024-02-27 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19093:
---

 Summary: add load tests for abfs rename resilience
 Key: HADOOP-19093
 URL: https://issues.apache.org/jira/browse/HADOOP-19093
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure, test
Affects Versions: 3.3.6
Reporter: Steve Loughran
Assignee: Steve Loughran


I need a load test to verify that the rename resilience of the manifest 
committer actually works as intended

* test suite with name ILoadTest* prefix (as with s3)
* parallel test running with many threads doing many renames
* verify that rename recovery should be detected
* and that all renames MUST NOT fail.

maybe also: metrics for this in fs and doc update. 
Possibly; LogExactlyOnce to warn of load issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19092) ABFS phase 4: post Hadoop 3.4.0 features

2024-02-27 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19092:
---

 Summary: ABFS phase 4: post Hadoop 3.4.0 features
 Key: HADOOP-19092
 URL: https://issues.apache.org/jira/browse/HADOOP-19092
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Steve Loughran


Uber-JIRA for ABFS work so we can close HADOOP-18072 as done for 3.4.0

Assuming 3.4.1 is a rapid roll of packing, dependencies and critical fixes, 
this should target 3.4.2 and beyond



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19087) Release Hadoop 3.4.1

2024-02-23 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19087:
---

 Summary: Release Hadoop 3.4.1
 Key: HADOOP-19087
 URL: https://issues.apache.org/jira/browse/HADOOP-19087
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Affects Versions: 3.4.0
Reporter: Steve Loughran


Release a minor update to hadoop 3.4.0 with

* packaging enhancements
* updated dependencies (where viable)
* fixes for critical issues found after 3.4.0 released
* low-risk feature enhancements (those which don't impact schedule...)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19065) Update Protocol Buffers installation to 3.21.12

2024-02-22 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19065.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Update Protocol Buffers installation to 3.21.12 
> 
>
> Key: HADOOP-19065
> URL: https://issues.apache.org/jira/browse/HADOOP-19065
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Update docs and docker script to cover downloading the 3.21.12 protobuf 
> compiler



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19086) move commons-logging to 1.2

2024-02-22 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19086:
---

 Summary: move commons-logging to 1.2
 Key: HADOOP-19086
 URL: https://issues.apache.org/jira/browse/HADOOP-19086
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Affects Versions: 3.4.0
Reporter: Steve Loughran


although hadoop doesn't use the APIs itself, it bundles commons-logging as 
things it depends on (http components) do.

the version hadoop declares (1.1.3) is out of date compared to its dependencies

* update pom and LICENSE-binary



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-18487) Make protobuf 2.5 an optional runtime dependency.

2024-02-22 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-18487:
-

still there under yarn-api; will do followup

> Make protobuf 2.5 an optional runtime dependency.
> -
>
> Key: HADOOP-18487
> URL: https://issues.apache.org/jira/browse/HADOOP-18487
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, ipc
>Affects Versions: 3.3.4
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> uses of protobuf 2.5 and RpcEnginej have been deprecated since 3.3.0 in 
> HADOOP-17046
> while still keeping those files around (for a long time...), how about we 
> make the protobuf 2.5.0 export off hadoop common and hadoop-hdfs *provided*, 
> rather than *compile*
> that way, if apps want it for their own apis, they have to explicitly ask for 
> it, but at least our own scans don't break.
> i have no idea what will happen to the rest of the stack at this point, it 
> will be "interesting" to see



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19084) hadoop-common exports logback as a transitive dependency

2024-02-21 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19084:
---

 Summary: hadoop-common exports logback as a transitive dependency
 Key: HADOOP-19084
 URL: https://issues.apache.org/jira/browse/HADOOP-19084
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 3.4.0, 3.5.0
Reporter: Steve Loughran


this is probably caused by HADOOP-18613:

ZK is pulling in some extra transitive stuff which surfaces in applications 
which import hadoop-common into their poms. It doesn't seem to show up in our 
distro, but downstream you get warnings about duplicate logging stuff

{code}
|  +- org.apache.zookeeper:zookeeper:jar:3.8.3:compile
|  |  +- org.apache.zookeeper:zookeeper-jute:jar:3.8.3:compile
|  |  |  \- (org.apache.yetus:audience-annotations:jar:0.12.0:compile - omitted 
for duplicate)
|  |  +- org.apache.yetus:audience-annotations:jar:0.12.0:compile
|  |  +- (io.netty:netty-handler:jar:4.1.94.Final:compile - omitted for 
conflict with 4.1.100.Final)
|  |  +- (io.netty:netty-transport-native-epoll:jar:4.1.94.Final:compile - 
omitted for conflict with 4.1.100.Final)
|  |  +- (org.slf4j:slf4j-api:jar:1.7.30:compile - omitted for duplicate)
|  |  +- ch.qos.logback:logback-core:jar:1.2.10:compile
|  |  +- ch.qos.logback:logback-classic:jar:1.2.10:compile
|  |  |  +- (ch.qos.logback:logback-core:jar:1.2.10:compile - omitted for 
duplicate)
|  |  |  \- (org.slf4j:slf4j-api:jar:1.7.32:compile - omitted for conflict with 
1.7.30)
|  |  \- (commons-io:commons-io:jar:2.11.0:compile - omitted for conflict with 
2.14.0)

{code}

proposed: exclude the zk dependencies we either override outselves or don't 
need. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19083) hadoop binary tarball to exclude aws v2 sdk

2024-02-21 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19083:
---

 Summary: hadoop binary tarball to exclude aws v2 sdk
 Key: HADOOP-19083
 URL: https://issues.apache.org/jira/browse/HADOOP-19083
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: build, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Have the default hadoop binary .tar.gz exclude the aws v2 sdk by default. 

This SDK brings the total size of the distribution to about 1 GB.

Proposed
* add a profile to include the aws sdk in the dist module
* disable it by default

Instead we document which version is needed. 
The hadoop-aws and hadoop-cloud storage maven artifacts will declare their 
dependencies, so apps building with those get to do the download.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19080) S3A createFakeDirectory/put fails on object lock bucket if the path exists

2024-02-16 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19080:
---

 Summary: S3A createFakeDirectory/put fails on object lock bucket 
if the path exists
 Key: HADOOP-19080
 URL: https://issues.apache.org/jira/browse/HADOOP-19080
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.3.6
Reporter: Steve Loughran


s3 bucket with object lock enabled fails in createFakeDirectory (reported on 
S.O)

error implies that we need to calculate and include the md5 checksum on the 
PUT, which gets complex once you include CSE into the mix: the checksum of the 
encrypted data is what'd be required.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19057) S3 public test bucket landsat-pds unreadable -needs replacement

2024-02-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19057.
-
Fix Version/s: 3.3.9
   Resolution: Fixed

> S3 public test bucket landsat-pds unreadable -needs replacement
> ---
>
> Key: HADOOP-19057
> URL: https://issues.apache.org/jira/browse/HADOOP-19057
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0, 3.2.4, 3.3.9, 3.3.6, 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> The s3 test bucket used in hadoop-aws tests of S3 select and large file reads 
> is no longer publicly accessible
> {code}
> java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on 
> landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null 
> (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended 
> Request ID: 
> O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null
> {code}
> * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large 
> file for some reading tests
> * changing the default value disables s3 select tests on older releases
> * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it 
> will be skipped
> Proposed
> * we locate a new large file under the (requester pays) s3a://usgs-landsat/ 
> bucket . All releases with HADOOP-18168 can use this
> * update 3.4.1 source to use this; document it
> * do something similar for 3.3.9 + maybe even cut s3 select there too.
> * document how to use it on older releases with requester-pays support
> * document how to completely disable it on older releases.
> h2. How to fix (most) landsat test failures on older releases
> add this to your auth-keys.xml file. Expect some failures in a few tests 
> with-hardcoded references to the bucket (assumed role delegation tokens)
> {code}
>   
> fs.s3a.scale.test.csvfile
> s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz
> file used in scale tests
>   
>   
> fs.s3a.bucket.noaa-cors-pds.endpoint.region
> us-east-1
>   
>   
> fs.s3a.bucket.noaa-isd-pds.multipart.purge
> false
> Don't try to purge uploads in the read-only bucket, as
> it will only create log noise.
>   
>   
> fs.s3a.bucket.noaa-isd-pds.probe
> 0
> Let's postpone existence checks to the first IO operation 
> 
>   
>   
> fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header
> false
> Do not add the referrer header
>   
>   
> fs.s3a.bucket.noaa-isd-pds.prefetch.block.size
> 128k
> Use a small prefetch size so tests fetch multiple 
> blocks
>   
>   
> fs.s3a.select.enabled
> false
>   
> {code}
> Some delegation token tests will still fail; these have hard-coded references 
> to the old bucket. *Do not worry about these*
> {code}
> [ERROR]   ITestDelegatedMRJob.testJobSubmissionCollectsTokens[0] » 
> AccessDenied s3a://la...
> [ERROR]   ITestDelegatedMRJob.testJobSubmissionCollectsTokens[1] » 
> AccessDenied s3a://la...
> [ERROR]   ITestDelegatedMRJob.testJobSubmissionCollectsTokens[2] » 
> AccessDenied s3a://la...
> [ERROR]   
> ITestRoleDelegationInFilesystem>ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->ITestSessionDelegationInFilesystem.readLandsatMetadata:614
>  » AccessDenied
> [ERROR]   
> ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->readLandsatMetadata:614
>  » AccessDenied
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18930) S3A: make fs.s3a.create.performance an option you can set for the entire bucket

2024-02-13 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18930.
-
Fix Version/s: 3.4.0
   (was: 3.3.7-aws)
   Resolution: Fixed

> S3A: make fs.s3a.create.performance an option you can set for the entire 
> bucket
> ---
>
> Key: HADOOP-18930
> URL: https://issues.apache.org/jira/browse/HADOOP-18930
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.3.9
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> make the fs.s3a.create.performance option something you can set everywhere, 
> rather than just in an openFile() option or under a magic path.
> this improves performance on apps like iceberg where filenames are generated 
> with UUIDs in them, so we know there are no overwrites



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19059) update AWS SDK to support S3 Access Grants in S3A

2024-02-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19059.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> update AWS SDK to support S3 Access Grants in S3A
> -
>
> Key: HADOOP-19059
> URL: https://issues.apache.org/jira/browse/HADOOP-19059
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Assignee: Jason Han
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In order to support S3 Access 
> Grants(https://aws.amazon.com/s3/features/access-grants/) in S3A, we need to 
> update AWS SDK in hadooop package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19072) S3A: expand optimisations on stores with "fs.s3a.create.performance"

2024-02-08 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19072:
---

 Summary: S3A: expand optimisations on stores with 
"fs.s3a.create.performance"
 Key: HADOOP-19072
 URL: https://issues.apache.org/jira/browse/HADOOP-19072
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


on an s3a store with fs.s3a.create.performance set, speed up other operations

*  mkdir to skip parent directory check: just do a HEAD to see if there's a 
file at the target location




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-02-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19045.
-
Resolution: Fixed

> S3A: pass request timeouts down to sdk clients
> --
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-02-06 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-19045:
-

this is broken because core-default.xml sets it to 0

> S3A: pass request timeouts down to sdk clients
> --
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18993) Allow to not isolate S3AFileSystem classloader when needed

2024-02-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18993.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Allow to not isolate S3AFileSystem classloader when needed
> --
>
> Key: HADOOP-18993
> URL: https://issues.apache.org/jira/browse/HADOOP-18993
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: hadoop-thirdparty
>Affects Versions: 3.3.6
>Reporter: Antonio Murgia
>Assignee: Antonio Murgia
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> In HADOOP-17372 the S3AFileSystem forces the configuration classloader to be 
> the same as the one that loaded S3AFileSystem. This leads to the 
> impossibility in Spark applications to load third party credentials providers 
> as user jars.
> I propose to add a configuration key 
> {{fs.s3a.extensions.isolated.classloader}} with a default value of {{true}} 
> that if set to {{false}} will not perform the classloader set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19049) Class loader leak caused by StatisticsDataReferenceCleaner thread

2024-02-03 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19049.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Class loader leak caused by StatisticsDataReferenceCleaner thread
> -
>
> Key: HADOOP-19049
> URL: https://issues.apache.org/jira/browse/HADOOP-19049
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Affects Versions: 3.3.6
>Reporter: Jia Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The 
> "org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner" 
> daemon thread was created by FileSystem. 
> This is fine if the thread's context class loader is the system class loader, 
> but it's bad if the context class loader is a custom class loader. The 
> reference held by this daemon thread means that the class loader can never 
> become eligible for GC.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19044) AWS SDK V2 - Update S3A region logic

2024-02-02 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19044.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> AWS SDK V2 - Update S3A region logic 
> -
>
> Key: HADOOP-19044
> URL: https://issues.apache.org/jira/browse/HADOOP-19044
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Ahmar Suhail
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> If both fs.s3a.endpoint & fs.s3a.endpoint.region are empty, Spark will set 
> fs.s3a.endpoint to 
> s3.amazonaws.com here:
> [https://github.com/apache/spark/blob/9a2f39318e3af8b3817dc5e4baf52e548d82063c/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L540]
>  
>  
> HADOOP-18908, updated the region logic such that if fs.s3a.endpoint.region is 
> set, or if a region can be parsed from fs.s3a.endpoint (which will happen in 
> this case, region will be US_EAST_1), cross region access is not enabled. 
> This will cause 400 errors if the bucket is not in US_EAST_1. 
>  
> Proposed: Updated the logic so that if the endpoint is the global 
> s3.amazonaws.com , cross region access is enabled.  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18987) Corrections to Hadoop FileSystem API Definition

2024-02-02 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18987.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Corrections to Hadoop FileSystem API Definition
> ---
>
> Key: HADOOP-18987
> URL: https://issues.apache.org/jira/browse/HADOOP-18987
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.6
>Reporter: Dieter De Paepe
>Assignee: Dieter De Paepe
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> I noticed a lot of inconsistencies, typos and informal statements in the 
> "formal" FileSystem API definition 
> ([https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/index.html)]
> Creating this ticket to link my PR against.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17784) hadoop-aws landsat-pds test bucket will be deleted after Jul 1, 2021

2024-01-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17784.
-
Resolution: Duplicate

HADOOP-17784 will address this now the bucket is completely gone

> hadoop-aws landsat-pds test bucket will be deleted after Jul 1, 2021
> 
>
> Key: HADOOP-17784
> URL: https://issues.apache.org/jira/browse/HADOOP-17784
> Project: Hadoop Common
>  Issue Type: Test
>  Components: fs/s3, test
>Reporter: Leona Yoda
>Priority: Major
> Attachments: org.apache.hadoop.fs.s3a.select.ITestS3SelectMRJob.txt
>
>
> I found an anouncement that landsat-pds buket will be deleted on July 1, 2021
> (https://registry.opendata.aws/landsat-8/)
> and  I think this bucket  is used in th test of hadoop-aws module use
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestConstants.java#L93]
>  
> At this time I can access the bucket but we might have to change the test 
> bucket in someday.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19057) S3 public test bucket landsat-pds unreadable -needs replacement

2024-01-30 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19057:
---

 Summary: S3 public test bucket landsat-pds unreadable -needs 
replacement
 Key: HADOOP-19057
 URL: https://issues.apache.org/jira/browse/HADOOP-19057
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.3.6, 3.2.4, 3.4.0, 3.3.9, 3.5.0
Reporter: Steve Loughran


The s3 test bucket used in hadoop-aws tests of S3 select and large file reads 
is no longer publicly accessible

{code}
java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on 
landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null 
(Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended Request 
ID: 
O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null

{code}

* Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large file 
for some reading tests
* changing the default value disables s3 select tests on older releases
* if fs.s3a.scale.test.csvfile is set to " " then other tests which need it 
will be skipped

Proposed
* we locate a new large file under the (requester pays) s3a://usgs-landsat/ 
bucket . All releases with HADOOP-18168 can use this
* update 3.4.1 source to use this; document it
* do something similar for 3.3.9 + maybe even cut s3 select there too.
* document how to use it on older releases with requester-pays support
* document how to completely disable it on older releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19022) S3A : ITestS3AConfiguration#testRequestTimeout failure

2024-01-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19022.
-
Fix Version/s: 3.5.0
   3.4.1
 Assignee: Steve Loughran
   Resolution: Duplicate

> S3A : ITestS3AConfiguration#testRequestTimeout failure
> --
>
> Key: HADOOP-19022
> URL: https://issues.apache.org/jira/browse/HADOOP-19022
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.5.0, 3.4.1
>
>
> "fs.s3a.connection.request.timeout" should be specified in milliseconds as per
> {code:java}
> Duration apiCallTimeout = getDuration(conf, REQUEST_TIMEOUT,
> DEFAULT_REQUEST_TIMEOUT_DURATION, TimeUnit.MILLISECONDS, Duration.ZERO); 
> {code}
> The test fails consistently because it sets 120 ms timeout which is less than 
> 15s (min network operation duration), and hence gets reset to 15000 ms based 
> on the enforcement.
>  
> {code:java}
> [ERROR] testRequestTimeout(org.apache.hadoop.fs.s3a.ITestS3AConfiguration)  
> Time elapsed: 0.016 s  <<< FAILURE!
> java.lang.AssertionError: Configured fs.s3a.connection.request.timeout is 
> different than what AWS sdk configuration uses internally expected:<12> 
> but was:<15000>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.fs.s3a.ITestS3AConfiguration.testRequestTimeout(ITestS3AConfiguration.java:444)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-01-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19045.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> S3A: pass request timeouts down to sdk clients
> --
>
> Key: HADOOP-19045
> URL: https://issues.apache.org/jira/browse/HADOOP-19045
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> s3a client timeout settings are getting down to http client, but not sdk 
> timeouts, so you can't have a longer timeout than the default. This surfaces 
> in the inability to tune the timeouts for CreateSession calls even now the 
> latest SDK does pick it up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18830) S3A: Cut S3 Select

2024-01-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18830.
-
Fix Version/s: 3.4.1
 Hadoop Flags: Incompatible change
 Release Note: S3 Select is no longer supported through the S3A connector
   Resolution: Fixed

> S3A: Cut S3 Select
> --
>
> Key: HADOOP-18830
> URL: https://issues.apache.org/jira/browse/HADOOP-18830
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> getting s3 select to work with the v2 sdk is tricky, we need to add extra 
> libraries to the classpath beyond just bundle.jar. we can do this but
> * AFAIK nobody has ever done CSV predicate pushdown, as it breaks split logic 
> completely
> * CSV is a bad format
> * one-line JSON more structured but also way less efficient
> ORC/Parquet benefit from vectored IO and work spanning the cluster.
> accordingly, I'm wondering what to do about s3 select
> # cut?
> # downgrade to optional and document the extra classes on the classpath
> Option #2 is straightforward and effectively the default. we can also declare 
> the feature deprecated.
> {code}
> [ERROR] 
> testReadLandsatRecordsNoMatch(org.apache.hadoop.fs.s3a.select.ITestS3SelectLandsat)
>   Time elapsed: 147.958 s  <<< ERROR!
> java.io.IOException: java.lang.NoClassDefFoundError: 
> software/amazon/eventstream/MessageDecoder
> at 
> org.apache.hadoop.fs.s3a.select.SelectObjectContentHelper.select(SelectObjectContentHelper.java:75)
> at 
> org.apache.hadoop.fs.s3a.WriteOperationHelper.lambda$select$10(WriteOperationHelper.java:660)
> at 
> org.apache.hadoop.fs.store.audit.AuditingFunctions.lambda$withinAuditSpan$0(AuditingFunctions.java:62)
> at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19046) S3A: update sdk versions

2024-01-24 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19046.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> S3A: update sdk versions
> 
>
> Key: HADOOP-19046
> URL: https://issues.apache.org/jira/browse/HADOOP-19046
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Move up to the most recent versions of the v2 sdk, with a v1 update just to 
> keep some CVE checking happy.
> {code}
> 1.12.599
> 2.23.5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18975) AWS SDK v2: extend support for FIPS endpoints

2024-01-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18975.
-
Resolution: Fixed

> AWS SDK v2:  extend support for FIPS endpoints
> --
>
> Key: HADOOP-18975
> URL: https://issues.apache.org/jira/browse/HADOOP-18975
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> v1 SDK supported FIPS just by changing the endpoint.
> Now we have a new builder setting to use.
> * add new  fs.s3a.endpoint.fips option
> * pass it down
> * test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool

2024-01-22 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19015.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting 
> for connection from pool
> --
>
> Key: HADOOP-19015
> URL: https://issues.apache.org/jira/browse/HADOOP-19015
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Getting errors in jobs which can be fixed by increasing this 
> 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on 
> s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0:
>  software.amazon.awssdk.core.exception.SdkClientException: Unable to execute 
> HTTP request: Timeout waiting for connection from pool   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2024-01-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18883.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Expect-100 JDK bug resolution: prevent multiple server calls
> 
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>  
> With the current implementation of HttpURLConnection if server rejects the 
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance 
> (ex getHeaderField(), or getHeaderFields()). They will internally call 
> getOuputStream() which invokes writeRequests(), which make the actual server 
> call. 
> In the AbfsHttpOperation, after sendRequest() we call processResponse() 
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due 
> to expect-100 error, we consume the exception and let the code go ahead. So, 
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which 
> will be triggered after getOutputStream is failed. These invocation will lead 
> to server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19048) ItestCustomSigner failing against S3Express Buckets

2024-01-19 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19048:
---

 Summary: ItestCustomSigner failing against S3Express Buckets
 Key: HADOOP-19048
 URL: https://issues.apache.org/jira/browse/HADOOP-19048
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.5.0
Reporter: Steve Loughran


getting test failures against S3 Express buckets with {{ItestCustomSigner}}; 
not seen with classic s3 stores.


{code}
[ERROR] 
testCustomSignerAndInitializer[simple-delete](org.apache.hadoop.fs.s3a.auth.ITestCustomSigner)
  Time elapsed: 6.12 s  <<< ERROR!
org.apache.hadoop.fs.s3a.AWSBadRequestException: PUT 0-byte object  on 
fork-0006/test/testCustomSignerAndInitializer[simple-delete]/customsignerpath1: 
software.amazon.awssdk.services.s3.model.S3Exception: 
x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* 
or x-amz-trailer headers were found. (Service: S3, Status Code: 400, Request 
ID: 0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: 
kZJZG05LGCBu7lsNKNf):InvalidRequest: x-amz-sdk-checksum-algorithm specified, 
but no corresponding x-amz-checksum-* or x-amz-trailer headers were found. 
(Service: S3, Status Code: 400, Request ID: 
0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: 
kZJZG05LGCBu7lsNKNf)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:259)
...
Caused by: software.amazon.awssdk.services.s3.model.S3Exception: 
x-amz-sdk-checksum-algorithm specified, but no corresponding x-amz-checksum-* 
or x-amz-trailer headers were found. (Service: S3, Status Code: 400, Request 
ID: 0033eada6b00018d21962f1b05094a80435cca52, Extended Request ID: 
kZJZG05LGCBu7lsNKNf)
at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)


{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19033) S3A: disable checksum validation

2024-01-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19033.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> S3A: disable checksum validation
> 
>
> Key: HADOOP-19033
> URL: https://issues.apache.org/jira/browse/HADOOP-19033
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> AWS v2 sdk turns on client-side checksum validation; this kills performance
> Given we are using TLS to download from AWS s3, there's implicit channel 
> checksumming going on on, that's along with the IPv4 TCP checksumming.
> We don't need it, all it does is slow us down.
> proposed: disable in DefaultS3ClientFactory
> I don't want to add an option to enable it as it only complicates life (yet 
> another config option), but I am open to persuasion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17824) ITestCustomSigner fails with NPE against private endpoint

2024-01-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17824.
-
Resolution: Cannot Reproduce

stack trace no longer valid for v2 sdk; closing as cannot reproduce

> ITestCustomSigner fails with NPE against private endpoint
> -
>
> Key: HADOOP-17824
> URL: https://issues.apache.org/jira/browse/HADOOP-17824
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Priority: Minor
>
> ITestCustomSigner fails when the tester is pointed at a private endpoint



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19046) S3A: update sdk versions

2024-01-18 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19046:
---

 Summary: S3A: update sdk versions
 Key: HADOOP-19046
 URL: https://issues.apache.org/jira/browse/HADOOP-19046
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: build, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Move up to the most recent versions of the v2 sdk, with a v1 update just to 
keep some CVE checking happy.


{code}
1.12.599
2.23.5

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19045) S3A: pass request timeouts down to sdk clients

2024-01-18 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19045:
---

 Summary: S3A: pass request timeouts down to sdk clients
 Key: HADOOP-19045
 URL: https://issues.apache.org/jira/browse/HADOOP-19045
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


s3a client timeout settings are getting down to http client, but not sdk 
timeouts, so you can't have a longer timeout than the default. This surfaces in 
the inability to tune the timeouts for CreateSession calls even now the latest 
SDK does pick it up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19043) S3A: Regression: ITestS3AOpenCost fails on prefetch test runs

2024-01-17 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19043:
---

 Summary: S3A: Regression: ITestS3AOpenCost fails on prefetch test 
runs
 Key: HADOOP-19043
 URL: https://issues.apache.org/jira/browse/HADOOP-19043
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Getting test failures in the new ITestS3AOpenCost tests when run with 
{{-Dprefetch}}

Thought I'd tested this, but clearly not
* class cast failures on asserts (fix: skip)
* bytes read different in one test: (fix: identify and address)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19042) S3A: detect and recover from SSL ConnectionReset exceptions

2024-01-17 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19042:
---

 Summary: S3A: detect and recover from SSL ConnectionReset 
exceptions
 Key: HADOOP-19042
 URL: https://issues.apache.org/jira/browse/HADOOP-19042
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.6, 3.4.0
Reporter: Steve Loughran


s3a input stream doesn't recover from SSL exceptions, specifically 
ConnectionReset

This is a variant of HADOOP-19027, except it's surfaced on an older release...
# need to make sure the specific exception is handled by aborting stream and 
retrying -so map to the new HttpChannelEOFException
# all of thisd needs to be backported






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19027) S3A: S3AInputStream doesn't recover from HTTP/channel exceptions

2024-01-16 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19027.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

in 3.5 though I hope to backport to 3.4.1

> S3A: S3AInputStream doesn't recover from HTTP/channel exceptions
> 
>
> Key: HADOOP-19027
> URL: https://issues.apache.org/jira/browse/HADOOP-19027
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> S3AInputStream doesn't seem to recover from Http exceptions raised through 
> HttpClient or through OpenSSL.
> * review the recovery code to make sure it is retrying enough, it looks 
> suspiciously like it doesn't
> * detect the relevant openssl, shaded httpclient and unshaded httpclient 
> exceptions, map to a standard one and treat as comms error in our retry policy
> This is not the same as the load balancer/proxy returning 443/444 which we 
> map to AWSNoResponseException. We can't reuse that as it expects to be 
> created from an 
> {{software.amazon.awssdk.awscore.exception.AwsServiceException}} exception 
> with the relevant fields...changing it could potentially be incompatible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19037) S3A: S3A: ITestS3AConfiguration failing with region problems

2024-01-12 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19037:
---

 Summary: S3A: S3A: ITestS3AConfiguration failing with region 
problems
 Key: HADOOP-19037
 URL: https://issues.apache.org/jira/browse/HADOOP-19037
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.4.0
Reporter: Steve Loughran


After commented out the default region in my ~/.aws/config [default} profile, 
test ITestS3AConfiguration. testS3SpecificSignerOverride() fails


{code}
[ERROR] 
testS3SpecificSignerOverride(org.apache.hadoop.fs.s3a.ITestS3AConfiguration)  
Time elapsed: 0.054 s  <<< ERROR!
software.amazon.awssdk.core.exception.SdkClientException: Unable to load region 
from any of the providers in the chain 
software.amazon.awssdk.regions.providers.DefaultAwsRegionProviderChain@12c626f8:
 
[software.amazon.awssdk.regions.providers.SystemSettingsRegionProvider@ae63559: 
Unable to load region from system settings. Region must be specified either via 
environment variable (AWS_REGION) or  system property (aws.region)., 
software.amazon.awssdk.regions.providers.AwsProfileRegionProvider@6e6cfd4c: No 
region provided in profile: default, 
software.amazon.awssdk.regions.providers.InstanceProfileRegionProvider@139147de:
 EC2 Metadata is disabled. Unable to retrieve region information from EC2 
Metadata service.]

{code}

I'm worried the sdk update has rolled back to the 3.3.x region problems where 
well-configured developer setups / ec2 deployments hid problems. certainly we 
can see the code is checking these paths



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19004) S3A: Support Authentication through HttpSigner API

2024-01-11 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19004.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> S3A: Support Authentication through HttpSigner API 
> ---
>
> Key: HADOOP-19004
> URL: https://issues.apache.org/jira/browse/HADOOP-19004
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The latest AWS SDK changes how signing works, and for signing S3Express 
> signatures the new {{software.amazon.awssdk.http.auth}} auth mechanism is 
> needed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18981) Move oncrpc/portmap from hadoop-nfs to hadoop-common

2024-01-11 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18981.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Move oncrpc/portmap from hadoop-nfs to hadoop-common
> 
>
> Key: HADOOP-18981
> URL: https://issues.apache.org/jira/browse/HADOOP-18981
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> We want to use udpserver/client for other use cases, rather than only for 
> NFS. One such use case is to export NameNodeHAState for NameNodes via a UDP 
> server. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19033) S3A: disable checksum validation

2024-01-10 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19033:
---

 Summary: S3A: disable checksum validation
 Key: HADOOP-19033
 URL: https://issues.apache.org/jira/browse/HADOOP-19033
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Steve Loughran
Assignee: Steve Loughran


AWS v2 sdk turns on client-side checksum validation; this kills performance

Given we are using TLS to download from AWS s3, there's implicit channel 
checksumming going on on, that's along with the IPv4 TCP checksumming.

We don't need it, all it does is slow us down.

proposed: disable in DefaultS3ClientFactory

I don't want to add an option to enable it as it only complicates life (yet 
another config option), but I am open to persuasion




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19032) MultiObjectDeleteException bulk delete of odd filenames

2024-01-10 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19032:
---

 Summary: MultiObjectDeleteException bulk delete of odd filenames
 Key: HADOOP-19032
 URL: https://issues.apache.org/jira/browse/HADOOP-19032
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Possibly transient. note bucket is versioned.

{code}
org.apache.hadoop.fs.s3a.AWSS3IOException: 
Remove S3 Dir Markers on 
s3a://stevel-london/Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-aws/target/test-dir/7/testContextURI/createTest:
 org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteException: 
[S3Error(Key=Users/stevel/Projects/hadoop-trunk/hadoop-tools/hadoop-aws/target/test-dir/7/testContextURI/createTest/()&^%$#@!~_+}{>

[jira] [Created] (HADOOP-19027) S3A: S3AInputStream doesn't recover from HTTP exceptions

2024-01-05 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19027:
---

 Summary: S3A: S3AInputStream doesn't recover from HTTP exceptions
 Key: HADOOP-19027
 URL: https://issues.apache.org/jira/browse/HADOOP-19027
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran




S3AInputStream doesn't seem to recover from Http exceptions raised through 
HttpClient or through OpenSSL.

* review the recovery code to make sure it is retrying enough, it looks 
suspiciously like it doesn't
* detect the relevant openssl, shaded httpclient and unshaded httpclient 
exceptions, map to a standard one and treat as comms error in our retry policy

This is not the same as the load balancer/proxy returning 443/444 which we map 
to AWSNoResponseException. We can't reuse that as it expects to be created from 
an {{software.amazon.awssdk.awscore.exception.AwsServiceException}} exception 
with the relevant fields...changing it could potentially be incompatible.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19026) S3A: TestIAMInstanceCredentialsProvider.testIAMInstanceCredentialsInstantiate failure

2024-01-05 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19026:
---

 Summary: S3A: 
TestIAMInstanceCredentialsProvider.testIAMInstanceCredentialsInstantiate failure
 Key: HADOOP-19026
 URL: https://issues.apache.org/jira/browse/HADOOP-19026
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.4.0
Reporter: Steve Loughran


test failure in TestIAMInstanceCredentialsProvider; looks like the test is 
running in an EC2 VM whose IAM service isn't providing credentials -and the 
test isn't set up to ignore that.


{code}
Caused by: software.amazon.awssdk.core.exception.SdkClientException: The 
requested metadata is not found
at http://169.254.169.254/latest/meta-data/iam/security-credentials/
at 
software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
at 
software.amazon.awssdk.regions.util.HttpResourcesUtils.readResource(HttpResourcesUtils.java:125)
at 
software.amazon.awssdk.regions.util.HttpResourcesUtils.readResource(HttpResourcesUtils.java:91)
at 
software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.lambda$getSecurityCredentials$3(InstanceProfileCredentialsProvider.java:256)
at 
software.amazon.awssdk.utils.FunctionalUtils.lambda$safeSupplier$4(FunctionalUtils.java:108)
at 
software.amazon.awssdk.utils.FunctionalUtils.invokeSafely(FunctionalUtils.java:136)
at 
software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.getSecurityCredentials(InstanceProfileCredentialsProvider.java:256)
at 
software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.createEndpointProvider(InstanceProfileCredentialsProvider.java:204)
at 
software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.refreshCredentials(InstanceProfileCredentialsProvider.java:150)

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17912) ABFS: Support for Encryption Context

2024-01-01 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17912.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> ABFS: Support for Encryption Context
> 
>
> Key: HADOOP-17912
> URL: https://issues.apache.org/jira/browse/HADOOP-17912
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sumangala Patki
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Support for customer-provided encryption keys at the file level, superceding 
> the global (account-level) key use in HADOOP-17536.
> ABFS driver will support an "EncryptionContext" plugin for retrieving 
> encryption information, the implementation for which should be provided by 
> the client. The keys/context retrieved will be sent via request headers to 
> the server, which will store the encryption context. Subsequent REST calls to 
> server that access data/user metadata of the file will require fetching the 
> encryption context through a GetFileProperties call and retrieving the key 
> from the custom provider, before sending the request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18540) Upgrade Bouncy Castle to 1.70

2024-01-01 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18540.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Upgrade Bouncy Castle to 1.70
> -
>
> Key: HADOOP-18540
> URL: https://issues.apache.org/jira/browse/HADOOP-18540
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.4.0
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Upgrade Bouncycastle to 1.70 to resolve
>  
> |[[sonatype-2021-4916] CWE-327: Use of a Broken or Risky Cryptographic 
> Algorithm|https://ossindex.sonatype.org/vulnerability/sonatype-2021-4916?component-type=maven=org.bouncycastle/bcprov-jdk15on]|
> |[[sonatype-2019-0673] CWE-400: Uncontrolled Resource Consumption ('Resource 
> Exhaustion')|https://ossindex.sonatype.org/vulnerability/sonatype-2019-0673?component-type=maven=org.bouncycastle/bcprov-jdk15on]|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19008) S3A: Upgrade AWS SDK to 2.21.41

2023-12-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19008.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> S3A: Upgrade AWS SDK to 2.21.41
> ---
>
> Key: HADOOP-19008
> URL: https://issues.apache.org/jira/browse/HADOOP-19008
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0, 3.3.7-aws
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> sdk 2.21.41 is out and logging now picks up the log4j.properties options. 
> move to this ASAP



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19008) S3A: Upgrade AWS SDK to 2.21.41

2023-12-08 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19008:
---

 Summary: S3A: Upgrade AWS SDK to 2.21.41
 Key: HADOOP-19008
 URL: https://issues.apache.org/jira/browse/HADOOP-19008
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0, 3.3.7-aws
Reporter: Steve Loughran
Assignee: Steve Loughran


sdk 2.21.41 is out and logging now picks up the log4j.properties options. 

move to this ASAP



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18999) S3A: debug logging for http traffic to S3 stores

2023-12-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18999.
-
Resolution: Not A Problem

no longer needed as 2.21.41 logs properly!

> S3A: debug logging for http traffic to S3 stores
> 
>
> Key: HADOOP-18999
> URL: https://issues.apache.org/jira/browse/HADOOP-18999
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> AWS SDK bundle.jar logging doesn't set up right.
> {code}
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> {code}
> Cloudstore commands have a -debug option to force set this through log4j 
> APIs; this does work. 
> Proposed:
> * add reflection-based ability to set/query log4j log levels (+tests, 
> obviously)
> * add a new log `org.apache.hadoop.fs.s3a.logging.sdk`
> * if set to DEBUG, DefaultS3ClientFactory will enable logging on the aws 
> internal/shaded classes
> this allows log4j.properties to turn on logging; reflection ensures all is 
> well on other log back-ends and when unshaded aws sdk jars are in use



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19007) S3A: transfer manager not wired up to s3a executor pool

2023-12-08 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19007:
---

 Summary: S3A: transfer manager not wired up to s3a executor pool
 Key: HADOOP-19007
 URL: https://issues.apache.org/jira/browse/HADOOP-19007
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


S3ClientFactory.createS3TransferManager() doesn't use the executor declared in 
S3ClientCreationParameters.transferManagerExecutor

* method needs to take S3ClientCreationParameters
* and set the transfer manager executor



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18925) S3A: add option "fs.s3a.copy.from.local.enabled" to enable/disable CopyFromLocalOperation

2023-12-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18925.
-
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> S3A: add option "fs.s3a.copy.from.local.enabled" to enable/disable 
> CopyFromLocalOperation
> -
>
> Key: HADOOP-18925
> URL: https://issues.apache.org/jira/browse/HADOOP-18925
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> reported failure of CopyFromLocalOperation.getFinalPath() during job 
> submission with s3a declared as cluster fs.
> add an emergency option to disable this optimised uploader and revert to the 
> superclass implementation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18997) S3A: Add option fs.s3a.s3express.create.session to enable/disable CreateSession

2023-12-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18997.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> S3A: Add option fs.s3a.s3express.create.session to enable/disable 
> CreateSession
> ---
>
> Key: HADOOP-18997
> URL: https://issues.apache.org/jira/browse/HADOOP-18997
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> add a way to disable the need to use the createsession call, so as to allow 
> for
> * simplifying our role test runs
> * benchmarking the performance hit
> * troubleshooting IAM permissions
> this can also be disabled from the sysprop "aws.disableS3ExpressAuth"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >