from:"Steve Loughran \\\(JIRA\\\)"

[jira] [Commented] (HADOOP-19157) [ABFS] Filesystem contract tests to use methodPath for robust parallel test runs

2024-04-23 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840122#comment-17840122
 ] 

Steve Loughran commented on HADOOP-19157:
-

note: this is not a problem with abfs -it just has the most ambitious test 
runner.

{code}

[ERROR] 
testMkdirsPopulatingAllNonexistentAncestors(org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractMkdir)
  Time elapsed: 0.475 s  <<< ERROR!
java.io.FileNotFoundException: 
abfs://stevel-test...@stevelukwest.dfs.core.windows.net/fork-0002/test/testMkdirsPopulatingAllNonexistentAncestors/a/b/c/d/e/f/g/h/i/j/k/L
 nested dir should exist: not found 
abfs://stevel-test...@stevelukwest.dfs.core.windows.net/fork-0002/test/testMkdirsPopulatingAllNonexistentAncestors/a/b/c/d/e/f/g/h/i/j/k/L
 in 
abfs://stevel-test...@stevelukwest.dfs.core.windows.net/fork-0002/test/testMkdirsPopulatingAllNonexistentAncestors/a/b/c/d/e/f/g/h/i/j/k
at 
org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists(ContractTestUtils.java:985)
at 
org.apache.hadoop.fs.contract.ContractTestUtils.assertPathExists(ContractTestUtils.java:963)
at 
org.apache.hadoop.fs.contract.AbstractFSContractTestBase.assertPathExists(AbstractFSContractTestBase.java:319)
at 
org.apache.hadoop.fs.contract.AbstractContractMkdirTest.testMkdirsPopulatingAllNonexistentAncestors(AbstractContractMkdirTest.java:150)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.FileNotFoundException: Operation failed: "The specified path 
does not exist.", 404, HEAD, 
https://stevelukwest.dfs.core.windows.net/stevel-testing/fork-0002/test/testMkdirsPopulatingAllNonexistentAncestors/a/b/c/d/e/f/g/h/i/j/k/L?upn=false=getStatus=90s,
 rId: 50a0ad90-f01f-0065-688c-95083600
at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1503)
at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:736)
at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:724)
at 
org.apache.hadoop.fs.contract.ContractTestUtils.verifyPathExists(ContractTestUtils.java:979)
... 18 more
Caused by: Operation failed: "The specified path does not exist.", 404, HEAD, 
https://stevelukwest.dfs.core.windows.net/stevel-testing/fork-0002/test/testMkdirsPopulatingAllNonexistentAncestors/a/b/c/d/e/f/g/h/i/j/k/L?upn=false=getStatus=90s,
 rId: 50a0ad90-f01f-0065-688c-95083600
at 
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:270)
at 
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:216)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.measureDurationOfInvocation(IOStatisticsBinding.java:494)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:465)
at 
org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:214)
at 
org.apache.hadoop.fs.azurebfs.services.AbfsClient.getPathStatus(AbfsClient.java:1083)
at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getFileStatus(AzureBlobFileSystemStore.java:1115)
at 
org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.getFileStatus(AzureBlobFileSystem.java:734)
... 20 more

[ERROR] 
testNoMkdirOverFile(org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractMkdir)
  Time elapsed: 0.437 s  <<< ERROR!
java.io.FileNotFoundException: Operation failed: "The specified path does not 
exist.", 404, HEAD,

[jira] [Created] (HADOOP-19157) [ABFS] Filesystem contract tests to use methodPath for robust parallel test runs

2024-04-23 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19157:
---

 Summary: [ABFS] Filesystem contract tests to use methodPath for 
robust parallel test runs
 Key: HADOOP-19157
 URL: https://issues.apache.org/jira/browse/HADOOP-19157
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure, test
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


hadoop-azure supports parallel test runs, but unlike hadoop-aws, the azure ones 
are parallelised across methods in the same test suites.

this can fail badly where contract tests have hard coded filenames and assume 
that they can use this across all test cases. Shows up when you are testing on 
a store with reduced IO capacity triggering retries and making some test cases 
slower

Fix: hadoop-common contract tests to use methodPath() names



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

2024-04-23 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19102.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
> --
>
> Key: HADOOP-19102
> URL: https://issues.apache.org/jira/browse/HADOOP-19102
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
> If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
> read more data than the buffer array can hold, which causes an exception.
> Change: To avoid this, we will keep footerBufferSize = 
> min(readBufferSizeConfig, footerBufferSizeConfig)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19085) Compatibility Benchmark over HCFS Implementations

2024-04-22 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839704#comment-17839704
 ] 

Steve Loughran commented on HADOOP-19085:
-

that's really interesting. abfs has full filesystem semantics; s3 doesn't and 
we always trade off correctness for performance.

* can you attach the results?
* regarding other connectors, gcs is the obvious one

> Compatibility Benchmark over HCFS Implementations
> -
>
> Key: HADOOP-19085
> URL: https://issues.apache.org/jira/browse/HADOOP-19085
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, test
>Affects Versions: 3.4.0
>Reporter: Han Liu
>Assignee: Han Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: HADOOP-19085.001.patch, HDFS Compatibility Benchmark 
> Design.pdf
>
>
> {*}Background：{*}Hadoop-Compatible File System (HCFS) is a core conception in 
> big data storage ecosystem, providing unified interfaces and generally clear 
> semantics, and has become the de-factor standard for industry storage systems 
> to follow and conform with. There have been a series of HCFS implementations 
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
> Storage, and more from storage service's providers on their own.
> {*}Problems：{*}However, as indicated by introduction.md, there is no formal 
> suite to do compatibility assessment of a file system for all such HCFS 
> implementations. Thus, whether the functionality is well accomplished and 
> meets the core compatible expectations mainly relies on service provider's 
> own report. Meanwhile, Hadoop is also developing and new features are 
> continuously contributing to HCFS interfaces for existing implementations to 
> follow and update, in which case, Hadoop also needs a tool to quickly assess 
> if these features are supported or not for a specific HCFS implementation. 
> Besides, the known hadoop command line tool or hdfs shell is used to directly 
> interact with a HCFS storage system, where most commands correspond to 
> specific HCFS interfaces and work well. Still, there are cases that are 
> complicated and may not work, like expunge command. To check such commands 
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal：{*}Accordingly, we propose to define a formal HCFS compatibility 
> benchmark and provide corresponding tool to do the compatibility assessment 
> for an HCFS storage system. The benchmark and tool should consider both HCFS 
> interfaces and hdfs shell commands. Different scenarios require different 
> kinds of compatibilities. For such consideration, we could define different 
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage 
> providers and storage users. For end users, it can be used to evalute the 
> compatibility level and determine if the storage system in question is 
> suitable for the required scenarios. For storage providers, it helps to 
> quickly generate an objective and reliable report about core functioins of 
> the storage service. As an instance, if the HCFS got a 100% on a suite named 
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have 
> been well achieved. It is also a guide indicating how storage service 
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19083) provide hadoop binary tarball without aws v2 sdk

2024-04-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19083:

Description: 
Have the default hadoop binary .tar.gz exclude the aws v2 sdk by default. 

This SDK brings the total size of the distribution to about 1 GB.

Proposed
* add a profile to include the aws sdk in the dist module
* document it for local building
* for release builds, we modify our release ant builds to generate modified x86 
and arm64 releases without the file.





  was:
Have the default hadoop binary .tar.gz exclude the aws v2 sdk by default. 

This SDK brings the total size of the distribution to about 1 GB.

Proposed
* add a profile to include the aws sdk in the dist module
* disable it by default

Instead we document which version is needed. 
The hadoop-aws and hadoop-cloud storage maven artifacts will declare their 
dependencies, so apps building with those get to do the download.




> provide hadoop binary tarball without aws v2 sdk
> 
>
> Key: HADOOP-19083
> URL: https://issues.apache.org/jira/browse/HADOOP-19083
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> Have the default hadoop binary .tar.gz exclude the aws v2 sdk by default. 
> This SDK brings the total size of the distribution to about 1 GB.
> Proposed
> * add a profile to include the aws sdk in the dist module
> * document it for local building
> * for release builds, we modify our release ant builds to generate modified 
> x86 and arm64 releases without the file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19154) upgrade bouncy castle to 1.78.1 due to CVEs

2024-04-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19154:

Affects Version/s: 3.3.6
   3.4.0

> upgrade bouncy castle to 1.78.1 due to CVEs
> ---
>
> Key: HADOOP-19154
> URL: https://issues.apache.org/jira/browse/HADOOP-19154
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.4.0, 3.3.6
>Reporter: PJ Fanning
>Priority: Major
>
> [https://www.bouncycastle.org/releasenotes.html#r1rv78]
> There is a v1.78.1 release but no notes for it yet.
> For v1.78
> h3. 2.1.5 Security Advisories.
> Release 1.78 deals with the following CVEs:
>  * CVE-2024-29857 - Importing an EC certificate with specially crafted F2m 
> parameters can cause high CPU usage during parameter evaluation.
>  * CVE-2024-30171 - Possible timing based leakage in RSA based handshakes due 
> to exception processing eliminated.
>  * CVE-2024-30172 - Crafted signature and public key can be used to trigger 
> an infinite loop in the Ed25519 verification code.
>  * CVE-2024-301XX - When endpoint identification is enabled and an SSL socket 
> is not created with an explicit hostname (as happens with 
> HttpsURLConnection), hostname verification could be performed against a 
> DNS-resolved IP address. This has been fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19153) hadoop-common still exports logback as a transitive dependency

2024-04-17 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19153:
---

 Summary: hadoop-common still exports logback as a transitive 
dependency
 Key: HADOOP-19153
 URL: https://issues.apache.org/jira/browse/HADOOP-19153
 Project: Hadoop Common
  Issue Type: Bug
  Components: build, common
Affects Versions: 3.4.0
Reporter: Steve Loughran


Even though HADOOP-19084 set out to stop it, somehow ZK's declaration of a 
logback dependency is still contaminating the hadoop-common dependency graph, 
so causing problems downstream.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19084) prune dependency exports of hadoop-* modules

2024-04-17 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838241#comment-17838241
 ] 

Steve Loughran commented on HADOOP-19084:
-

logback is still being exported by hadoop-common via zk. 

> prune dependency exports of hadoop-* modules
> 
>
> Key: HADOOP-19084
> URL: https://issues.apache.org/jira/browse/HADOOP-19084
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.5.0, 3.4.1
>
>
> this is probably caused by HADOOP-18613:
> ZK is pulling in some extra transitive stuff which surfaces in applications 
> which import hadoop-common into their poms. It doesn't seem to show up in our 
> distro, but downstream you get warnings about duplicate logging stuff
> {code}
> |  +- org.apache.zookeeper:zookeeper:jar:3.8.3:compile
> |  |  +- org.apache.zookeeper:zookeeper-jute:jar:3.8.3:compile
> |  |  |  \- (org.apache.yetus:audience-annotations:jar:0.12.0:compile - 
> omitted for duplicate)
> |  |  +- org.apache.yetus:audience-annotations:jar:0.12.0:compile
> |  |  +- (io.netty:netty-handler:jar:4.1.94.Final:compile - omitted for 
> conflict with 4.1.100.Final)
> |  |  +- (io.netty:netty-transport-native-epoll:jar:4.1.94.Final:compile - 
> omitted for conflict with 4.1.100.Final)
> |  |  +- (org.slf4j:slf4j-api:jar:1.7.30:compile - omitted for duplicate)
> |  |  +- ch.qos.logback:logback-core:jar:1.2.10:compile
> |  |  +- ch.qos.logback:logback-classic:jar:1.2.10:compile
> |  |  |  +- (ch.qos.logback:logback-core:jar:1.2.10:compile - omitted for 
> duplicate)
> |  |  |  \- (org.slf4j:slf4j-api:jar:1.7.32:compile - omitted for conflict 
> with 1.7.30)
> |  |  \- (commons-io:commons-io:jar:2.11.0:compile - omitted for conflict 
> with 2.14.0)
> {code}
> proposed: exclude the zk dependencies we either override outselves or don't 
> need. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19150) Test ITestAbfsRestOperationException#testAuthFailException is broken.

2024-04-16 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837883#comment-17837883
 ] 

Steve Loughran commented on HADOOP-19150:
-

actually, it should be 

{code}

AbfsRestOperationException e = intercept(AbfsRestOperationException, () -> 
fs.getFileStatus(new Path("/")));

+ all the asserts on the exception

{code}


> Test ITestAbfsRestOperationException#testAuthFailException is broken. 
> --
>
> Key: HADOOP-19150
> URL: https://issues.apache.org/jira/browse/HADOOP-19150
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Mukund Thakur
>Priority: Major
>
> {code:java}
> intercept(Exception.class,
> () -> {
>   fs.getFileStatus(new Path("/"));
> }); {code}
> Intercept shouldn't be used as there are assertions in catch statements. 
>  
> CC [~ste...@apache.org]  [~anujmodi2021] [~asrani_anmol] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19025) Migrate abstract contract tests to AssertJ

2024-04-15 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19025:

Affects Version/s: 3.4.0

> Migrate abstract contract tests to AssertJ
> --
>
> Key: HADOOP-19025
> URL: https://issues.apache.org/jira/browse/HADOOP-19025
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
>
> Replace JUnit4 assertions with equivalent functionality from AssertJ, to make 
> contract tests more independent of JUnit version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18296) Memory fragmentation in ChecksumFileSystem Vectored IO implementation.

2024-04-15 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-18296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837385#comment-17837385
 ] 

Steve Loughran commented on HADOOP-18296:
-

Mukund, do we actually need to coalesce ranges on local fs reads? because it is 
all local. we can just push out a list of independent regions.

we do still need to deal with failures by adding the ability to return buffers 
to any pool on failure.

> Memory fragmentation in ChecksumFileSystem Vectored IO implementation.
> --
>
> Key: HADOOP-18296
> URL: https://issues.apache.org/jira/browse/HADOOP-18296
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Priority: Minor
>  Labels: fs
>
> As we have implemented merging of ranges in the ChecksumFSInputChecker 
> implementation of vectored IO api, it can lead to memory fragmentation. Let 
> me explain by example.
>  
> Suppose client requests for 3 ranges. 
> 0-500, 700-1000 and 1200-1500.
> Now because of merging, all the above ranges will get merged into one and we 
> will allocate a big byte buffer of 0-1500 size but return sliced byte buffers 
> for the desired ranges.
> Now once the client is done reading all the ranges, it will only be able to 
> free the memory for requested ranges and memory of the gaps will never be 
> released for eg here (500-700 and 1000-1200).
>  
> Note this only happens for direct byte buffers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19082) S3A: Update AWS SDK V2 to 2.24.6

2024-04-12 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836675#comment-17836675
 ] 

Steve Loughran commented on HADOOP-19082:
-

FYI this SDK has an unshaded copy of org.slf4j.LoggerFactory  in it; which is 
not what anyone wants

> S3A: Update AWS SDK V2 to 2.24.6
> 
>
> Key: HADOOP-19082
> URL: https://issues.apache.org/jira/browse/HADOOP-19082
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Harshit Gupta
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Update the AWS SDK to 2.24.6 from 2.23.5 for latest updates in packaging 
> w.r.t. imds module.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19079) HttpExceptionUtils to check that loaded class is really an exception before instantiation

2024-04-11 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19079.
-
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
   Resolution: Fixed

> HttpExceptionUtils to check that loaded class is really an exception before 
> instantiation
> -
>
> Key: HADOOP-19079
> URL: https://issues.apache.org/jira/browse/HADOOP-19079
> Project: Hadoop Common
>  Issue Type: Task
>  Components: common, security
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> It can be dangerous taking class names as inputs from HTTP messages even if 
> we control the source. Issue is in HttpExceptionUtils in hadoop-common 
> (validateResponse method).
> I can provide a PR that will highlight the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19079) HttpExceptionUtils to check that loaded class is really an exception before instantiation

2024-04-11 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19079:

Summary: HttpExceptionUtils to check that loaded class is really an 
exception before instantiation  (was: check that class that is loaded is really 
an exception)

> HttpExceptionUtils to check that loaded class is really an exception before 
> instantiation
> -
>
> Key: HADOOP-19079
> URL: https://issues.apache.org/jira/browse/HADOOP-19079
> Project: Hadoop Common
>  Issue Type: Task
>  Components: common, security
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> It can be dangerous taking class names as inputs from HTTP messages even if 
> we control the source. Issue is in HttpExceptionUtils in hadoop-common 
> (validateResponse method).
> I can provide a PR that will highlight the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19096) [ABFS] Enhancing Client-Side Throttling Metrics Updation Logic

2024-04-11 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19096.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> [ABFS] Enhancing Client-Side Throttling Metrics Updation Logic
> --
>
> Key: HADOOP-19096
> URL: https://issues.apache.org/jira/browse/HADOOP-19096
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.1
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> ABFS has a client-side throttling mechanism which works on the metrics 
> collected from past requests made. I requests are getting failed due to 
> throttling at server, we update our metrics and client side backoff is 
> calculated based on those metrics.
> This PR enhances the logic to decide which requests should be considered to 
> compute client side backoff interval as follows:
> For each request made by ABFS driver, we will determine if they should 
> contribute to Client-Side Throttling based on the status code and result:
>  # Status code in 2xx range: Successful Operations should contribute.
>  # Status code in 3xx range: Redirection Operations should not contribute.
>  # Status code in 4xx range: User Errors should not contribute.
>  # Status code is 503: Throttling Error should contribute only if they are 
> due to client limits breach as follows:
>  ## 503, Ingress Over Account Limit: Should Contribute
>  ## 503, Egress Over Account Limit: Should Contribute
>  ## 503, TPS Over Account Limit: Should Contribute
>  ## 503, Other Server Throttling: Should not Contribute.
>  # Status code in 5xx range other than 503: Should not Contribute.
>  # IOException and UnknownHostExceptions: Should not Contribute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19105) S3A: Recover from Vector IO read failures

2024-04-11 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19105:

Environment: 
s3a vector IO doesn't try to recover from read failures the way read() does.

Need to
* abort HTTP stream if considered needed
* retry active read which failed
* but not those which had succeeded

On a full failure we need to do something about any allocated buffer, which 
means we really need the buffer pool {{ByteBufferPool}} to return or also 
provide a "release" (Bytebuffer -> void) call which does the return.  we would 
need to
* add this as a new api with the implementations in s3a, local, rawlocal
* classic single allocator method remaps to the new one with (() -> null) as 
the response

This keeps the public API stable



  was:
s3a vector IO doesn't try to recover from read failures the way read() does.

Need to
* abort HTTP stream if considered needed
* retry active read which failed
* but not those which had succeeded




> S3A: Recover from Vector IO read failures
> -
>
> Key: HADOOP-19105
> URL: https://issues.apache.org/jira/browse/HADOOP-19105
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0, 3.3.6
> Environment: s3a vector IO doesn't try to recover from read failures 
> the way read() does.
> Need to
> * abort HTTP stream if considered needed
> * retry active read which failed
> * but not those which had succeeded
> On a full failure we need to do something about any allocated buffer, which 
> means we really need the buffer pool {{ByteBufferPool}} to return or also 
> provide a "release" (Bytebuffer -> void) call which does the return.  we 
> would need to
> * add this as a new api with the implementations in s3a, local, rawlocal
> * classic single allocator method remaps to the new one with (() -> null) as 
> the response
> This keeps the public API stable
>Reporter: Steve Loughran
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19101) Vectored Read into off-heap buffer broken in fallback implementation

2024-04-10 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19101:

Release Note: 
PositionedReadable.readVectored() will read incorrect data when reading from 
hdfs, azure abfs and other stores when given a direct buffer allocator. 

For cross-version compatibility, use on-heap buffer allocators only

> Vectored Read into off-heap buffer broken in fallback implementation
> 
>
> Key: HADOOP-19101
> URL: https://issues.apache.org/jira/browse/HADOOP-19101
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at 
> position zero even when the range is at a different offset. As a result: you 
> can get incorrect information.
> Thanks for this is straightforward: we pass in a FileRange and use its offset 
> as the starting position.
> However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely 
> read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we 
> have never seen this in production because the parquet and ORC libraries both 
> read into on-heap storage.
> Those libraries needs to be audited to make sure that they never attempt to 
> read into off-heap DirectBuffers. This is a bit trickier than you would think 
> because an allocator is passed in. For PARQUET-2171 we will 
> * only invoke the API on streams which explicitly declare their support for 
> the API (so fallback in parquet itself)
> * not invoke when direct buffer allocation is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19101) Vectored Read into off-heap buffer broken in fallback implementation

2024-04-10 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19101.
-
Fix Version/s: 3.3.9
   3.4.1
   Resolution: Fixed

> Vectored Read into off-heap buffer broken in fallback implementation
> 
>
> Key: HADOOP-19101
> URL: https://issues.apache.org/jira/browse/HADOOP-19101
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at 
> position zero even when the range is at a different offset. As a result: you 
> can get incorrect information.
> Thanks for this is straightforward: we pass in a FileRange and use its offset 
> as the starting position.
> However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely 
> read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we 
> have never seen this in production because the parquet and ORC libraries both 
> read into on-heap storage.
> Those libraries needs to be audited to make sure that they never attempt to 
> read into off-heap DirectBuffers. This is a bit trickier than you would think 
> because an allocator is passed in. For PARQUET-2171 we will 
> * only invoke the API on streams which explicitly declare their support for 
> the API (so fallback in parquet itself)
> * not invoke when direct buffer allocation is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19098) Vector IO: consistent specified rejection of overlapping ranges

2024-04-10 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19098.
-
Resolution: Fixed

> Vector IO: consistent specified rejection of overlapping ranges
> ---
>
> Key: HADOOP-19098
> URL: https://issues.apache.org/jira/browse/HADOOP-19098
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> Related to PARQUET-2171 q: "how do you deal with overlapping ranges?"
> I believe s3a rejects this, but the other impls may not.
> Proposed
> FS spec to say 
> * "overlap triggers IllegalArgumentException". 
> * special case: 0 byte ranges may be short circuited to return empty buffer 
> even without checking file length etc.
> Contract tests to validate this
> (+ common helper code to do this).
> I'll copy the validation stuff into the parquet PR for consistency with older 
> releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19098) Vector IO: consistent specified rejection of overlapping ranges

2024-04-10 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19098:

Fix Version/s: 3.3.9

> Vector IO: consistent specified rejection of overlapping ranges
> ---
>
> Key: HADOOP-19098
> URL: https://issues.apache.org/jira/browse/HADOOP-19098
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> Related to PARQUET-2171 q: "how do you deal with overlapping ranges?"
> I believe s3a rejects this, but the other impls may not.
> Proposed
> FS spec to say 
> * "overlap triggers IllegalArgumentException". 
> * special case: 0 byte ranges may be short circuited to return empty buffer 
> even without checking file length etc.
> Contract tests to validate this
> (+ common helper code to do this).
> I'll copy the validation stuff into the parquet PR for consistency with older 
> releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-16822) Provide source artifacts for hadoop-client-api

2024-04-10 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-16822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-16822:

Description: 
h5. Improvement request
The third-party libraries shading hadoop-client-api (& hadoop-client-runtime) 
artifacts are super useful.
 
Having uber source jar for hadoop-client-api (maybe even hadoop-client-runtime) 
would be great for downstream development & debugging purposes.

Are there any obstacles or objections against providing fat jar with all the 
hadoop client api as well ?

h5. Dev links
- *maven-shaded-plugin* and its *shadeSourcesContent* attribute
- 
https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#shadeSourcesContent

h2. Update April 2024: this has been reverted.

It turns out that it complicates debugging. If you want the source when 
debugging, the best way is just to check out the hadoop release you are working 
with and point your IDE at it.

  was:
h5. Improvement request
The third-party libraries shading hadoop-client-api (& hadoop-client-runtime) 
artifacts are super useful.
 
Having uber source jar for hadoop-client-api (maybe even hadoop-client-runtime) 
would be great for downstream development & debugging purposes.

Are there any obstacles or objections against providing fat jar with all the 
hadoop client api as well ?

h5. Dev links
- *maven-shaded-plugin* and its *shadeSourcesContent* attribute
- 
https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#shadeSourcesContent


> Provide source artifacts for hadoop-client-api
> --
>
> Key: HADOOP-16822
> URL: https://issues.apache.org/jira/browse/HADOOP-16822
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.3.1, 3.4.0, 3.2.3
>Reporter: Karel Kolman
>Assignee: Karel Kolman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HADOOP-16822-hadoop-client-api-source-jar.patch
>
>
> h5. Improvement request
> The third-party libraries shading hadoop-client-api (& hadoop-client-runtime) 
> artifacts are super useful.
>  
> Having uber source jar for hadoop-client-api (maybe even 
> hadoop-client-runtime) would be great for downstream development & debugging 
> purposes.
> Are there any obstacles or objections against providing fat jar with all the 
> hadoop client api as well ?
> h5. Dev links
> - *maven-shaded-plugin* and its *shadeSourcesContent* attribute
> - 
> https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#shadeSourcesContent
> h2. Update April 2024: this has been reverted.
> It turns out that it complicates debugging. If you want the source when 
> debugging, the best way is just to check out the hadoop release you are 
> working with and point your IDE at it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19119) spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()

2024-04-05 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19119:

Fix Version/s: 3.3.9

> spotbugs complaining about possible NPE in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()
> 
>
> Key: HADOOP-19119
> URL: https://issues.apache.org/jira/browse/HADOOP-19119
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: crypto
>Affects Versions: 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> PRs against hadoop-common are reporting spotbugs problems
> {code}
> Dodgy code Warnings
> Code  Warning
> NPPossible null pointer dereference in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return 
> value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details)
> In class org.apache.hadoop.crypto.key.kms.ValueQueue
> In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String)
> Local variable stored in JVM register ?
> Dereferenced at ValueQueue.java:[line 332]
> Known null at ValueQueue.java:[line 332]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19119) spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()

2024-04-05 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19119:

Affects Version/s: 3.4.0
   3.3.9

> spotbugs complaining about possible NPE in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()
> 
>
> Key: HADOOP-19119
> URL: https://issues.apache.org/jira/browse/HADOOP-19119
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: crypto
>Affects Versions: 3.4.0, 3.3.9, 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> PRs against hadoop-common are reporting spotbugs problems
> {code}
> Dodgy code Warnings
> Code  Warning
> NPPossible null pointer dereference in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return 
> value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details)
> In class org.apache.hadoop.crypto.key.kms.ValueQueue
> In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String)
> Local variable stored in JVM register ?
> Dereferenced at ValueQueue.java:[line 332]
> Known null at ValueQueue.java:[line 332]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19098) Vector IO: consistent specified rejection of overlapping ranges

2024-04-05 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19098:

Fix Version/s: 3.4.1

> Vector IO: consistent specified rejection of overlapping ranges
> ---
>
> Key: HADOOP-19098
> URL: https://issues.apache.org/jira/browse/HADOOP-19098
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Related to PARQUET-2171 q: "how do you deal with overlapping ranges?"
> I believe s3a rejects this, but the other impls may not.
> Proposed
> FS spec to say 
> * "overlap triggers IllegalArgumentException". 
> * special case: 0 byte ranges may be short circuited to return empty buffer 
> even without checking file length etc.
> Contract tests to validate this
> (+ common helper code to do this).
> I'll copy the validation stuff into the parquet PR for consistency with older 
> releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18656) ABFS: Support for Pagination in Recursive Directory Delete

2024-04-04 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18656:

Fix Version/s: 3.5.0

> ABFS: Support for Pagination in Recursive Directory Delete 
> ---
>
> Key: HADOOP-18656
> URL: https://issues.apache.org/jira/browse/HADOOP-18656
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.5
>Reporter: Sree Bhattacharyya
>Assignee: Anuj Modi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Today, when a recursive delete is issued for a large directory in ADLS Gen2 
> (HNS) account, the directory deletion happens in O(1) but in backend ACL 
> Checks are done recursively for each object inside that directory which in 
> case of large directory could lead to request time out. Pagination is 
> introduced in the Azure Storage Backend for these ACL checks.
> More information on how pagination works can be found on public documentation 
> of [Azure Delete Path 
> API|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/delete?view=rest-storageservices-datalakestoragegen2-2019-12-12].
> This PR contains changes to support this from client side. To trigger 
> pagination, client needs to add a new query parameter "paginated" and set it 
> to true along with recursive set to true. In return if the directory is 
> large, server might return a continuation token back to the caller. If caller 
> gets back a continuation token, it has to call the delete API again with 
> continuation token along with recursive and pagination set to true. This is 
> similar to directory delete of FNS account.
> Pagination is available only in versions "2023-08-03" onwards.
> PR also contains functional tests to verify driver works well with different 
> combinations of recursive and pagination features for HNS.
> Full E2E testing of pagination requires large dataset to be created and hence 
> not added as part of driver test suite. But extensive E2E testing has been 
> performed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18656) ABFS: Support for Pagination in Recursive Directory Delete

2024-04-04 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18656:

Target Version/s: 3.3.9, 3.5.0, 3.4.1  (was: 3.3.9, 3.5.0)

> ABFS: Support for Pagination in Recursive Directory Delete 
> ---
>
> Key: HADOOP-18656
> URL: https://issues.apache.org/jira/browse/HADOOP-18656
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.5
>Reporter: Sree Bhattacharyya
>Assignee: Anuj Modi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Today, when a recursive delete is issued for a large directory in ADLS Gen2 
> (HNS) account, the directory deletion happens in O(1) but in backend ACL 
> Checks are done recursively for each object inside that directory which in 
> case of large directory could lead to request time out. Pagination is 
> introduced in the Azure Storage Backend for these ACL checks.
> More information on how pagination works can be found on public documentation 
> of [Azure Delete Path 
> API|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/delete?view=rest-storageservices-datalakestoragegen2-2019-12-12].
> This PR contains changes to support this from client side. To trigger 
> pagination, client needs to add a new query parameter "paginated" and set it 
> to true along with recursive set to true. In return if the directory is 
> large, server might return a continuation token back to the caller. If caller 
> gets back a continuation token, it has to call the delete API again with 
> continuation token along with recursive and pagination set to true. This is 
> similar to directory delete of FNS account.
> Pagination is available only in versions "2023-08-03" onwards.
> PR also contains functional tests to verify driver works well with different 
> combinations of recursive and pagination features for HNS.
> Full E2E testing of pagination requires large dataset to be created and hence 
> not added as part of driver test suite. But extensive E2E testing has been 
> performed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19141) Update VectorIO default values consistently

2024-04-04 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19141:

Fix Version/s: 3.5.0

> Update VectorIO default values consistently
> ---
>
> Key: HADOOP-19141
> URL: https://issues.apache.org/jira/browse/HADOOP-19141
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.4.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.7, 3.5.0, 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18855) VectorIO API tuning/stabilization

2024-04-04 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18855:

Description: 
Changes needed to get the Vector IO code stable.

Specifically
* consistent behaviour across implementations
* broader testing
* resilience

+Ideally, abfs support. (s3a prefetching needs this too; see HADOOP-19144)

This work will be shaped by the experience of merging support into libraries 
and identifying issues/improvement opportunities

  was:
Changes needed to get the Vector IO code stable.

Specifically
* consistent behaviour across implementations
* broader testing
* resilience

+Ideally, abfs support. (s3a prefetching needs this too; see HADOOP-19144)


> VectorIO API tuning/stabilization
> -
>
> Key: HADOOP-18855
> URL: https://issues.apache.org/jira/browse/HADOOP-18855
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/s3
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> Changes needed to get the Vector IO code stable.
> Specifically
> * consistent behaviour across implementations
> * broader testing
> * resilience
> +Ideally, abfs support. (s3a prefetching needs this too; see HADOOP-19144)
> This work will be shaped by the experience of merging support into libraries 
> and identifying issues/improvement opportunities



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18855) VectorIO API tuning/stabilization

2024-04-04 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18855:

Description: 
Changes needed to get the Vector IO code stable.

Specifically
* consistent behaviour across implementations
* broader testing
* resilience

+Ideally, abfs support. (s3a prefetching needs this too; see HADOOP-19144)

  was:
what do do we need to do to improve the vector iO experience based on 
integration and use.

obviously, we cannot change anything incompatibly, but we may find bugs to fix 
and other possible enhancements


> VectorIO API tuning/stabilization
> -
>
> Key: HADOOP-18855
> URL: https://issues.apache.org/jira/browse/HADOOP-18855
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/s3
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> Changes needed to get the Vector IO code stable.
> Specifically
> * consistent behaviour across implementations
> * broader testing
> * resilience
> +Ideally, abfs support. (s3a prefetching needs this too; see HADOOP-19144)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19144) S3A prefetching to support Vector IO

2024-04-04 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19144:
---

 Summary: S3A prefetching to support Vector IO
 Key: HADOOP-19144
 URL: https://issues.apache.org/jira/browse/HADOOP-19144
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Add explicit support for vector IO in s3a prefetching stream.

* if a range is in 1+ cached block, it SHALL be read from cache and returned
* if a range is not in cache : TBD
* If a range is partially in cache: TBD

these are the same decisions that abfs has to make: should the client 
fetch/cache block or just do one or more GET requests

A big issue is: does caching of data fetched in a range request make any sense 
at all? Or more specifically: does fetching the blocks in which range requests 
are found make sense

Simply going to the store is a lot simpler



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19101) Vectored Read into off-heap buffer broken in fallback implementation

2024-04-04 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19101:

Fix Version/s: 3.5.0

> Vectored Read into off-heap buffer broken in fallback implementation
> 
>
> Key: HADOOP-19101
> URL: https://issues.apache.org/jira/browse/HADOOP-19101
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Fix For: 3.5.0
>
>
> {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at 
> position zero even when the range is at a different offset. As a result: you 
> can get incorrect information.
> Thanks for this is straightforward: we pass in a FileRange and use its offset 
> as the starting position.
> However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely 
> read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we 
> have never seen this in production because the parquet and ORC libraries both 
> read into on-heap storage.
> Those libraries needs to be audited to make sure that they never attempt to 
> read into off-heap DirectBuffers. This is a bit trickier than you would think 
> because an allocator is passed in. For PARQUET-2171 we will 
> * only invoke the API on streams which explicitly declare their support for 
> the API (so fallback in parquet itself)
> * not invoke when direct buffer allocation is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common

2024-04-03 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19140:
---

 Summary: [ABFS, S3A] Add IORateLimiter api to hadoop common
 Key: HADOOP-19140
 URL: https://issues.apache.org/jira/browse/HADOOP-19140
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/azure, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Create a rate limiter API in hadoop common which code (initially, manifest 
committer, bulk delete).. can request iO capacity for a specific operation.

this can be exported by filesystems so support shared rate limiting across all 
threads

pulled from HADOOP-19093 PR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19124) Update org.ehcache from 3.3.1 to 3.8.2.

2024-04-03 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833640#comment-17833640
 ] 

Steve Loughran commented on HADOOP-19124:
-

should we include in branch-3.4? I'd like to bring things up to date there

> Update org.ehcache from 3.3.1 to 3.8.2.
> ---
>
> Key: HADOOP-19124
> URL: https://issues.apache.org/jira/browse/HADOOP-19124
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.4.1
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> We need to enhance the caching functionality in Yarn Federation by adding a 
> limit on the number of cached entries. I noticed that the version of 
> org.ehcache is relatively old and requires an upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19114) upgrade to commons-compress 1.26.1 due to cves

2024-04-03 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19114:

Description: 
2 recent CVEs fixed - 
https://mvnrepository.com/artifact/org.apache.commons/commons-compress


Important: Denial of Service CVE-2024-25710
Moderate: Denial of Service CVE-2024-26308



  was:2 recent CVEs fixed - 
https://mvnrepository.com/artifact/org.apache.commons/commons-compress


> upgrade to commons-compress 1.26.1 due to cves
> --
>
> Key: HADOOP-19114
> URL: https://issues.apache.org/jira/browse/HADOOP-19114
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, CVE
>Affects Versions: 3.4.0
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> 2 recent CVEs fixed - 
> https://mvnrepository.com/artifact/org.apache.commons/commons-compress
> Important: Denial of Service CVE-2024-25710
> Moderate: Denial of Service CVE-2024-26308



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19098) Vector IO: consistent specified rejection of overlapping ranges

2024-04-02 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833295#comment-17833295
 ] 

Steve Loughran commented on HADOOP-19098:
-

fixed in 3.5; will backport to 3.4 and 3.3

> Vector IO: consistent specified rejection of overlapping ranges
> ---
>
> Key: HADOOP-19098
> URL: https://issues.apache.org/jira/browse/HADOOP-19098
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Related to PARQUET-2171 q: "how do you deal with overlapping ranges?"
> I believe s3a rejects this, but the other impls may not.
> Proposed
> FS spec to say 
> * "overlap triggers IllegalArgumentException". 
> * special case: 0 byte ranges may be short circuited to return empty buffer 
> even without checking file length etc.
> Contract tests to validate this
> (+ common helper code to do this).
> I'll copy the validation stuff into the parquet PR for consistency with older 
> releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19098) Vector IO: consistent specified rejection of overlapping ranges

2024-04-02 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19098:

Fix Version/s: 3.5.0

> Vector IO: consistent specified rejection of overlapping ranges
> ---
>
> Key: HADOOP-19098
> URL: https://issues.apache.org/jira/browse/HADOOP-19098
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Related to PARQUET-2171 q: "how do you deal with overlapping ranges?"
> I believe s3a rejects this, but the other impls may not.
> Proposed
> FS spec to say 
> * "overlap triggers IllegalArgumentException". 
> * special case: 0 byte ranges may be short circuited to return empty buffer 
> even without checking file length etc.
> Contract tests to validate this
> (+ common helper code to do this).
> I'll copy the validation stuff into the parquet PR for consistency with older 
> releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19115) upgrade to nimbus-jose-jwt 9.37.2 due to CVE

2024-04-02 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19115.
-
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
 Assignee: PJ Fanning
   Resolution: Fixed

> upgrade to nimbus-jose-jwt 9.37.2 due to CVE
> 
>
> Key: HADOOP-19115
> URL: https://issues.apache.org/jira/browse/HADOOP-19115
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, CVE
>Affects Versions: 3.4.0, 3.5.0
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> https://github.com/advisories/GHSA-gvpg-vgmx-xg6w



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19133) "No test bucket" error in ITestS3AContractVectoredRead if provided via CLI property

2024-04-01 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832940#comment-17832940
 ] 

Steve Loughran commented on HADOOP-19133:
-

thanks. looks like "removeBaseAndBucketOverrides" should be clever about 
handling undefined bucket binding and/or test set things up better

> "No test bucket" error in ITestS3AContractVectoredRead if provided via CLI 
> property
> ---
>
> Key: HADOOP-19133
> URL: https://issues.apache.org/jira/browse/HADOOP-19133
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3, test
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Attila Doroszlai
>Priority: Minor
>
> ITestS3AContractVectoredRead fails with {{NullPointerException: No test 
> bucket}} if test bucket is defined as {{-Dtest.fs.s3a.name=...}} via CLI , 
> not in {{auth-keys.xml}}.  The same setup works for other S3A contract tests. 
>  Tested on 3.3.6.
> {code:title=src/test/resources/auth-keys.xml}
> 
>   
> fs.s3a.endpoint
> ${test.fs.s3a.endpoint}
>   
>   
> fs.contract.test.fs.s3a
> ${test.fs.s3a.name}
>   
> 
> {code}
> {code}
> export AWS_ACCESS_KEY_ID=''
> export AWS_SECRET_KEY=''
> mvn -Dtest=ITestS3AContractVectoredRead -Dtest.fs.s3a.name="s3a://mybucket" 
> -Dtest.fs.s3a.endpoint="http://localhost:9878/; clean test
> {code}
> {code:title=test results}
> Tests run: 46, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 7.879 s <<< 
> FAILURE! - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead
> testMinSeekAndMaxSizeDefaultValues[Buffer type : 
> direct](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 1.95 s  <<< ERROR!
> java.lang.NullPointerException: No test bucket
>   at org.apache.hadoop.util.Preconditions.checkNotNull(Preconditions.java:88)
>   at 
> org.apache.hadoop.fs.s3a.S3ATestUtils.getTestBucketName(S3ATestUtils.java:714)
>   at 
> org.apache.hadoop.fs.s3a.S3ATestUtils.removeBaseAndBucketOverrides(S3ATestUtils.java:775)
>   at 
> org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead.testMinSeekAndMaxSizeDefaultValues(ITestS3AContractVectoredRead.java:104)
>   ...
> testMinSeekAndMaxSizeConfigsPropagation[Buffer type : 
> direct](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.176 s  <<< ERROR!
> testMultiVectoredReadStatsCollection[Buffer type : 
> direct](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.179 s  <<< ERROR!
> testNormalReadVsVectoredReadStatsCollection[Buffer type : 
> direct](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.155 s  <<< ERROR!
> testMinSeekAndMaxSizeDefaultValues[Buffer type : 
> array](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.116 s  <<< ERROR!
> testMinSeekAndMaxSizeConfigsPropagation[Buffer type : 
> array](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.102 s  <<< ERROR!
> testMultiVectoredReadStatsCollection[Buffer type : 
> array](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.105 s  <<< ERROR!
> testNormalReadVsVectoredReadStatsCollection[Buffer type : 
> array](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.107 s  <<< ERROR!
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19133) "No test bucket" error in ITestS3AContractVectoredRead if provided via CLI property

2024-04-01 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19133:

Affects Version/s: 3.3.6
   3.4.0

> "No test bucket" error in ITestS3AContractVectoredRead if provided via CLI 
> property
> ---
>
> Key: HADOOP-19133
> URL: https://issues.apache.org/jira/browse/HADOOP-19133
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3, test
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Attila Doroszlai
>Priority: Minor
>
> ITestS3AContractVectoredRead fails with {{NullPointerException: No test 
> bucket}} if test bucket is defined as {{-Dtest.fs.s3a.name=...}} via CLI , 
> not in {{auth-keys.xml}}.  The same setup works for other S3A contract tests. 
>  Tested on 3.3.6.
> {code:title=src/test/resources/auth-keys.xml}
> 
>   
> fs.s3a.endpoint
> ${test.fs.s3a.endpoint}
>   
>   
> fs.contract.test.fs.s3a
> ${test.fs.s3a.name}
>   
> 
> {code}
> {code}
> export AWS_ACCESS_KEY_ID=''
> export AWS_SECRET_KEY=''
> mvn -Dtest=ITestS3AContractVectoredRead -Dtest.fs.s3a.name="s3a://mybucket" 
> -Dtest.fs.s3a.endpoint="http://localhost:9878/; clean test
> {code}
> {code:title=test results}
> Tests run: 46, Failures: 0, Errors: 8, Skipped: 0, Time elapsed: 7.879 s <<< 
> FAILURE! - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead
> testMinSeekAndMaxSizeDefaultValues[Buffer type : 
> direct](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 1.95 s  <<< ERROR!
> java.lang.NullPointerException: No test bucket
>   at org.apache.hadoop.util.Preconditions.checkNotNull(Preconditions.java:88)
>   at 
> org.apache.hadoop.fs.s3a.S3ATestUtils.getTestBucketName(S3ATestUtils.java:714)
>   at 
> org.apache.hadoop.fs.s3a.S3ATestUtils.removeBaseAndBucketOverrides(S3ATestUtils.java:775)
>   at 
> org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead.testMinSeekAndMaxSizeDefaultValues(ITestS3AContractVectoredRead.java:104)
>   ...
> testMinSeekAndMaxSizeConfigsPropagation[Buffer type : 
> direct](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.176 s  <<< ERROR!
> testMultiVectoredReadStatsCollection[Buffer type : 
> direct](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.179 s  <<< ERROR!
> testNormalReadVsVectoredReadStatsCollection[Buffer type : 
> direct](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.155 s  <<< ERROR!
> testMinSeekAndMaxSizeDefaultValues[Buffer type : 
> array](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.116 s  <<< ERROR!
> testMinSeekAndMaxSizeConfigsPropagation[Buffer type : 
> array](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.102 s  <<< ERROR!
> testMultiVectoredReadStatsCollection[Buffer type : 
> array](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.105 s  <<< ERROR!
> testNormalReadVsVectoredReadStatsCollection[Buffer type : 
> array](org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead)  Time 
> elapsed: 0.107 s  <<< ERROR!
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19131) Assist reflection IO with WrappedOperations class

2024-03-28 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19131:

Summary: Assist reflection IO with WrappedOperations class  (was: Assist 
reflection iO with WrappedOperations class)

> Assist reflection IO with WrappedOperations class
> -
>
> Key: HADOOP-19131
> URL: https://issues.apache.org/jira/browse/HADOOP-19131
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> parquet, avro etc are still stuck building with older hadoop releases. 
> This makes using new APIs hard (PARQUET-2117) and means that APIs which are 5 
> years old (!) such as HADOOP-15229 just aren't picked up.
> This lack of openFIle() adoption hurts working with files in cloud storage as
> * extra HEAD requests are made
> * read policies can't be explicitly set
> * split start/end can't be passed down
> Proposed
> # create class org.apache.hadoop.io.WrappedOperations
> # add methods to wrap the apis
> # test in contract tests via reflection loading -verifies we have done it 
> properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19131) Assist reflection IO with WrappedOperations class

2024-03-28 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19131:

Description: 
parquet, avro etc are still stuck building with older hadoop releases. 

This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5 
years old such as HADOOP-15229 just aren't picked up.

This lack of openFIle() adoption hurts working with files in cloud storage as
* extra HEAD requests are made
* read policies can't be explicitly set
* split start/end can't be passed down

Proposed
# create class org.apache.hadoop.io.WrappedOperations
# add methods to wrap the apis
# test in contract tests via reflection loading -verifies we have done it 
properly.

  was:
parquet, avro etc are still stuck building with older hadoop releases. 

This makes using new APIs hard (PARQUET-2117) and means that APIs which are 5 
years old (!) such as HADOOP-15229 just aren't picked up.

This lack of openFIle() adoption hurts working with files in cloud storage as
* extra HEAD requests are made
* read policies can't be explicitly set
* split start/end can't be passed down

Proposed
# create class org.apache.hadoop.io.WrappedOperations
# add methods to wrap the apis
# test in contract tests via reflection loading -verifies we have done it 
properly.


> Assist reflection IO with WrappedOperations class
> -
>
> Key: HADOOP-19131
> URL: https://issues.apache.org/jira/browse/HADOOP-19131
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> parquet, avro etc are still stuck building with older hadoop releases. 
> This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5 
> years old such as HADOOP-15229 just aren't picked up.
> This lack of openFIle() adoption hurts working with files in cloud storage as
> * extra HEAD requests are made
> * read policies can't be explicitly set
> * split start/end can't be passed down
> Proposed
> # create class org.apache.hadoop.io.WrappedOperations
> # add methods to wrap the apis
> # test in contract tests via reflection loading -verifies we have done it 
> properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19131) Assist reflection iO with WrappedOperations class

2024-03-28 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19131:
---

 Summary: Assist reflection iO with WrappedOperations class
 Key: HADOOP-19131
 URL: https://issues.apache.org/jira/browse/HADOOP-19131
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/azure, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


parquet, avro etc are still stuck building with older hadoop releases. 

This makes using new APIs hard (PARQUET-2117) and means that APIs which are 5 
years old (!) such as HADOOP-15229 just aren't picked up.

This lack of openFIle() adoption hurts working with files in cloud storage as
* extra HEAD requests are made
* read policies can't be explicitly set
* split start/end can't be passed down

Proposed
# create class org.apache.hadoop.io.WrappedOperations
# add methods to wrap the apis
# test in contract tests via reflection loading -verifies we have done it 
properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-19131) Assist reflection iO with WrappedOperations class

2024-03-28 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-19131:
---

Assignee: Steve Loughran

> Assist reflection iO with WrappedOperations class
> -
>
> Key: HADOOP-19131
> URL: https://issues.apache.org/jira/browse/HADOOP-19131
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> parquet, avro etc are still stuck building with older hadoop releases. 
> This makes using new APIs hard (PARQUET-2117) and means that APIs which are 5 
> years old (!) such as HADOOP-15229 just aren't picked up.
> This lack of openFIle() adoption hurts working with files in cloud storage as
> * extra HEAD requests are made
> * read policies can't be explicitly set
> * split start/end can't be passed down
> Proposed
> # create class org.apache.hadoop.io.WrappedOperations
> # add methods to wrap the apis
> # test in contract tests via reflection loading -verifies we have done it 
> properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19090) Update Protocol Buffers installation to 3.23.4

2024-03-27 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831281#comment-17831281
 ] 

Steve Loughran commented on HADOOP-19090:
-

in trunk as fc166d3aec7c ; I don't see in 3.4 yet. note, we need to release a 
new version of hadoop-thirdparty for this, trunk is using 1.3.0-SNAPSHOT.



> Update Protocol Buffers installation to 3.23.4
> --
>
> Key: HADOOP-19090
> URL: https://issues.apache.org/jira/browse/HADOOP-19090
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.4.0, 3.3.9
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We are seeing issues with Java 8 usage of protobuf-java
> See https://issues.apache.org/jira/browse/HADOOP-18197 and comments about
> java.lang.NoSuchMethodError: 
> java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19047) Support InMemory Tracking Of S3A Magic Commits

2024-03-26 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19047.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Support InMemory Tracking Of S3A Magic Commits
> --
>
> Key: HADOOP-19047
> URL: https://issues.apache.org/jira/browse/HADOOP-19047
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The following are the operations which happens within a Task when it uses S3A 
> Magic Committer. 
> *During closing of stream*
> 1. A 0-byte file with a same name of the original file is uploaded to S3 
> using PUT operation. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L152]
>  for more information. This is done so that the downstream application like 
> Spark could get the size of the file which is being written.
> 2. MultiPartUpload(MPU) metadata is uploaded to S3. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicCommitTracker.java#L176]
>  for more information.
> *During TaskCommit*
> 1. All the MPU metadata which the task wrote to S3 (There will be 'x' number 
> of metadata file in S3 if a single task writes to 'x' files) are read and 
> rewritten to S3 as a single metadata file. Refer 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/magic/MagicS3GuardCommitter.java#L201]
>  for more information
> Since these operations happens with the Task JVM, We could optimize as well 
> as save cost by storing these information in memory when Task memory usage is 
> not a constraint. Hence the proposal here is to introduce a new MagicCommit 
> Tracker called "InMemoryMagicCommitTracker" which will store the 
> 1. Metadata of MPU in memory till the Task is committed
> 2. Store the size of the file which can be used by the downstream application 
> to get the file size before it is committed/visible to the output path.
> This optimization will save 2 PUT S3 calls, 1 LIST S3 call, and 1 GET S3 call 
> given a Task writes only 1 file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18980) S3A credential provider remapping: make extensible

2024-03-26 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18980:

Description: 
A new option fs.s3a.aws.credentials.provider.mapping takes a key value pair for 
automatic mapping of v1 credential providers to v2 credential providers.


h2. Backporting

There's a followup PR to the main patch which *should* be applied, as it 
hardens the parser.

{code}
HADOOP-18980. Invalid inputs for getTrimmedStringCollectionSplitByEquals 
(ADDENDUM) (#6546)
{code}


  was:
s3afs will now remap the common com.amazonaws credential providers to 
equivalents in the v2 sdk or in hadoop-aws

We could do the same for third party credential providers by taking a key=value 
list in a configuration property and adding to the map. 


h2. Backporting

There's a followup PR to the main patch which *should* be applied, as it 
hardens the parser.

{code}
HADOOP-18980. Invalid inputs for getTrimmedStringCollectionSplitByEquals 
(ADDENDUM) (#6546)
{code}



> S3A credential provider remapping: make extensible
> --
>
> Key: HADOOP-18980
> URL: https://issues.apache.org/jira/browse/HADOOP-18980
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.5.0, 3.4.1
>
>
> A new option fs.s3a.aws.credentials.provider.mapping takes a key value pair 
> for automatic mapping of v1 credential providers to v2 credential providers.
> h2. Backporting
> There's a followup PR to the main patch which *should* be applied, as it 
> hardens the parser.
> {code}
> HADOOP-18980. Invalid inputs for getTrimmedStringCollectionSplitByEquals 
> (ADDENDUM) (#6546)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-19116) update to zookeeper client 3.8.4 due to CVE-2024-23944

2024-03-25 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-19116:
---

Assignee: PJ Fanning

> update to zookeeper client 3.8.4 due to  CVE-2024-23944
> ---
>
> Key: HADOOP-19116
> URL: https://issues.apache.org/jira/browse/HADOOP-19116
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: CVE
>Affects Versions: 3.4.0, 3.3.6
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> https://github.com/advisories/GHSA-r978-9m6m-6gm6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19116) update to zookeeper client 3.8.4 due to CVE-2024-23944

2024-03-25 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19116.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> update to zookeeper client 3.8.4 due to  CVE-2024-23944
> ---
>
> Key: HADOOP-19116
> URL: https://issues.apache.org/jira/browse/HADOOP-19116
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: CVE
>Affects Versions: 3.4.0, 3.3.6
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> https://github.com/advisories/GHSA-r978-9m6m-6gm6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19116) update to zookeeper client 3.8.4 due to CVE-2024-23944

2024-03-25 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19116:

Summary: update to zookeeper client 3.8.4 due to  CVE-2024-23944  (was: 
update to zookeeper client 3.8.4 due to CVE)

> update to zookeeper client 3.8.4 due to  CVE-2024-23944
> ---
>
> Key: HADOOP-19116
> URL: https://issues.apache.org/jira/browse/HADOOP-19116
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: CVE
>Affects Versions: 3.4.0, 3.3.6
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/advisories/GHSA-r978-9m6m-6gm6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18869) ABFS: Fixing Behavior of a File System APIs on root path

2024-03-25 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18869:

Description: 
Following HDFS Apis are failing when called on a root path.

{*}{*}{*}{*}{*}{*}
|FS Call|Status|Error thrown to caller|
|create()|Failing|Operation failed: "The request URI is invalid.", 400, PUT, 
https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-02076119-21ed-4ada-bcd0-14afaae54013/?resource=file=90,
 InvalidUri, "The request URI is invalid. 
RequestId:1d23f8c2-d01f-0059-61b6-c60c2400 
Time:2023-08-04T09:29:55.4813818Z"|
|createNonRecursive()|Failing|Runtime Exception: 
java.lang.IllegalArgumentException: null path (This is occuring because 
getParentPath is null and getFileStatus is called on null)|
|setXAttr()|Failing|Operation failed: "The request URI is invalid.", 400, HEAD, 
https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-491399b3-c3d0-4568-9d4a-a26e0aa8f000/?upn=false=90|
|getXAttr()|Failing|Operation failed: "The request URI is invalid.", 400, HEAD, 
https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-491399b3-c3d0-4568-9d4a-a26e0aa8f000/?upn=false=91|

h2. important: xattr support was removed in HADOOP-19089; include that change 
when cherrypicking this

  was:
Following HDFS Apis are failing when called on a root path.

{*}{*}{*}{*}{*}{*}
|FS Call|Status|Error thrown to caller|
|create()|Failing|Operation failed: "The request URI is invalid.", 400, PUT, 
https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-02076119-21ed-4ada-bcd0-14afaae54013/?resource=file=90,
 InvalidUri, "The request URI is invalid. 
RequestId:1d23f8c2-d01f-0059-61b6-c60c2400 
Time:2023-08-04T09:29:55.4813818Z"|
|createNonRecursive()|Failing|Runtime Exception: 
java.lang.IllegalArgumentException: null path (This is occuring because 
getParentPath is null and getFileStatus is called on null)|
|setXAttr()|Failing|Operation failed: "The request URI is invalid.", 400, HEAD, 
https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-491399b3-c3d0-4568-9d4a-a26e0aa8f000/?upn=false=90|
|getXAttr()|Failing|Operation failed: "The request URI is invalid.", 400, HEAD, 
https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-491399b3-c3d0-4568-9d4a-a26e0aa8f000/?upn=false=91|


> ABFS: Fixing Behavior of a File System APIs on root path
> 
>
> Key: HADOOP-18869
> URL: https://issues.apache.org/jira/browse/HADOOP-18869
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.6
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Following HDFS Apis are failing when called on a root path.
> {*}{*}{*}{*}{*}{*}
> |FS Call|Status|Error thrown to caller|
> |create()|Failing|Operation failed: "The request URI is invalid.", 400, PUT, 
> https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-02076119-21ed-4ada-bcd0-14afaae54013/?resource=file=90,
>  InvalidUri, "The request URI is invalid. 
> RequestId:1d23f8c2-d01f-0059-61b6-c60c2400 
> Time:2023-08-04T09:29:55.4813818Z"|
> |createNonRecursive()|Failing|Runtime Exception: 
> java.lang.IllegalArgumentException: null path (This is occuring because 
> getParentPath is null and getFileStatus is called on null)|
> |setXAttr()|Failing|Operation failed: "The request URI is invalid.", 400, 
> HEAD, 
> https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-491399b3-c3d0-4568-9d4a-a26e0aa8f000/?upn=false=90|
> |getXAttr()|Failing|Operation failed: "The request URI is invalid.", 400, 
> HEAD, 
> https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-491399b3-c3d0-4568-9d4a-a26e0aa8f000/?upn=false=91|
> h2. important: xattr support was removed in HADOOP-19089; include that change 
> when cherrypicking this



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19089) [ABFS] Reverting Back Support of setXAttr() and getXAttr() on root path

2024-03-25 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19089.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> [ABFS] Reverting Back Support of setXAttr() and getXAttr() on root path
> ---
>
> Key: HADOOP-19089
> URL: https://issues.apache.org/jira/browse/HADOOP-19089
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> A while back changes were made to support HDFS.setXAttr() and HDFS.getXAttr() 
> on root path for ABFS Driver.
> For these, filesystem level APIs were introduced and used to set/get metadata 
> of container.
> Refer to Jira: [HADOOP-18869] ABFS: Fixing Behavior of a File System APIs on 
> root path - ASF JIRA (apache.org)
> Ideally, same set of APIs should be used, and root should be treated as a 
> path like any other path.
> This change is to avoid calling container APIs for these HDFS calls.
> As a result of this these APIs will fail on root path (as earlier) because 
> service does not support get/set of user properties on root path.
> This change will also update the documentation to reflect that these 
> operations are not supported on root path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19122) testListPathWithValueGreaterThanServerMaximum assert failure on heavily loaded store

2024-03-22 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19122:
---

 Summary: testListPathWithValueGreaterThanServerMaximum assert 
failure on heavily loaded store
 Key: HADOOP-19122
 URL: https://issues.apache.org/jira/browse/HADOOP-19122
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Steve Loughran


on an azure store which may be experiencing throttling. the listPath call 
returns less than the 5K limit. the assertion needs to be changed for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19122) [ABFS] testListPathWithValueGreaterThanServerMaximum assert failure on heavily loaded store

2024-03-22 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19122:

Component/s: test

> [ABFS] testListPathWithValueGreaterThanServerMaximum assert failure on 
> heavily loaded store
> ---
>
> Key: HADOOP-19122
> URL: https://issues.apache.org/jira/browse/HADOOP-19122
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Minor
>
> on an azure store which may be experiencing throttling. the listPath call 
> returns less than the 5K limit. the assertion needs to be changed for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19122) [ABFS] testListPathWithValueGreaterThanServerMaximum assert failure on heavily loaded store

2024-03-22 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19122:

Summary: [ABFS] testListPathWithValueGreaterThanServerMaximum assert 
failure on heavily loaded store  (was: 
testListPathWithValueGreaterThanServerMaximum assert failure on heavily loaded 
store)

> [ABFS] testListPathWithValueGreaterThanServerMaximum assert failure on 
> heavily loaded store
> ---
>
> Key: HADOOP-19122
> URL: https://issues.apache.org/jira/browse/HADOOP-19122
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Minor
>
> on an azure store which may be experiencing throttling. the listPath call 
> returns less than the 5K limit. the assertion needs to be changed for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19122) testListPathWithValueGreaterThanServerMaximum assert failure on heavily loaded store

2024-03-22 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829892#comment-17829892
 ] 

Steve Loughran commented on HADOOP-19122:
-


{code}
[ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 184.77 
s <<< FAILURE! - in org.apache.hadoop.fs.azurebfs.ITestAbfsClient
[ERROR] 
testListPathWithValueGreaterThanServerMaximum(org.apache.hadoop.fs.azurebfs.ITestAbfsClient)
  Time elapsed: 184.77 s  <<< FAILURE!
java.lang.AssertionError: 
[AbfsClient.listPath result will contain a maximum of 5000 items even if 
listMaxResults >= 5000 or directory contains more than 5000 items] 
Expected size:<5000> but was:<3381> in:
<[org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@38d6d518,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@190def4e,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@dafeaae,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@7046a87c,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@7eb6dd79,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@8373a7e,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@7a3f4b4c,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@3fff093f,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@776a56ac,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@46bf76f5,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@465a3045,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@1ab6f40,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@245196ef,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@19e4544e,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@e2b972f,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@542028c5,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@7cee3a40,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@479347b7,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@15532285,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@40e056ea,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@4d51588c,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@10a2106e,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@1ce94ff5,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@5efc0d84,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@192ebbef,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@198fe9c4,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@4fcf9be7,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@71613fb7,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@1d7c13b7,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@7bdb1815,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@4878a42a,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@301ca615,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@4d00ea0a,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@6eb8580f,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@63ecf806,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@3615d660,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@7c8f2a12,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@505b7dac,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@54738369,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@4ccde951,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@49179560,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@42aa27f3,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@a8a9c04,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@29ae6b65,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@5c6747a5,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@4f616665,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@11d5a931,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@4b7a1d2d,

org.apache.hadoop.fs.azurebfs.contracts.services.ListResultEntrySchema@2d462188,

[jira] [Updated] (HADOOP-19093) Improve rate limiting through ABFS in Manifest Committer

2024-03-21 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19093:

Affects Version/s: 3.4.0
   (was: 3.3.6)

> Improve rate limiting through ABFS in Manifest Committer
> 
>
> Key: HADOOP-19093
> URL: https://issues.apache.org/jira/browse/HADOOP-19093
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> I need a load test to verify that the rename resilience of the manifest 
> committer actually works as intended
> * test suite with name ILoadTest* prefix (as with s3)
> * parallel test running with many threads doing many renames
> * verify that rename recovery should be detected
> * and that all renames MUST NOT fail.
> maybe also: metrics for this in fs and doc update. 
> Possibly; LogExactlyOnce to warn of load issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19093) Improve rate limiting through ABFS in Manifest Committer

2024-03-21 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19093:

Summary: Improve rate limiting through ABFS in Manifest Committer  (was: 
add load tests for abfs rename resilience)

> Improve rate limiting through ABFS in Manifest Committer
> 
>
> Key: HADOOP-19093
> URL: https://issues.apache.org/jira/browse/HADOOP-19093
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure, test
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> I need a load test to verify that the rename resilience of the manifest 
> committer actually works as intended
> * test suite with name ILoadTest* prefix (as with s3)
> * parallel test running with many threads doing many renames
> * verify that rename recovery should be detected
> * and that all renames MUST NOT fail.
> maybe also: metrics for this in fs and doc update. 
> Possibly; LogExactlyOnce to warn of load issues



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library

2024-03-21 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829701#comment-17829701
 ] 

Steve Loughran commented on HADOOP-19120:
-

if we move to java11 there's apparently a better one there too. which probably 
supports kerberos auth, if that's relevant.

If you look at the s3a code related to connection pooling you'll see some of 
the problems to be handled
* explicity discard connections which return errors
* handle TTL of connections in pool
* the problem of *not enough connections in pool* recurrent one this.
* identification of state connections.

You'll need to handle these. Not looked at your PR, but we can link you to the 
problems so you know what to do.

> [ABFS]: ApacheHttpClient adaptation as network library
> --
>
> Key: HADOOP-19120
> URL: https://issues.apache.org/jira/browse/HADOOP-19120
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.5.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Apache HttpClient is more feature-rich and flexible and gives application 
> more granular control over networking parameter.
> ABFS currently relies on the JDK-net library. This library is managed by 
> OpenJDK and has no performance problem. However, it limits the application's 
> control over networking, and there are very few APIs and hooks exposed that 
> the application can use to get metrics, choose which and when a connection 
> should be reused. ApacheHttpClient will give important hooks to fetch 
> important metrics and control networking parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19118) KeyShell fails with NPE when KMS throws Exception with null as message

2024-03-19 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828517#comment-17828517
 ] 

Steve Loughran commented on HADOOP-19118:
-

if e.e.getLocalizedMessage() is null it should fall back to e.toString()

> KeyShell fails with NPE when KMS throws Exception with null as message
> --
>
> Key: HADOOP-19118
> URL: https://issues.apache.org/jira/browse/HADOOP-19118
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, crypto
>Affects Versions: 3.3.6
>Reporter: Dénes Bodó
>Priority: Major
>
> There is an issue in specific Ranger versions (where RANGER-3989 is not 
> fixed) which throws Exception in case of concurrent access to a HashMap with 
> Message {*}null{*}.
> {noformat}
> java.util.ConcurrentModificationException: null
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1503)
> at java.util.HashMap$EntryIterator.next(HashMap.java:1501) {noformat}
> This manifests in Hadoop's KeyShell as an Exception with message {*}null{*}.
> So when
> {code:java}
>   private String prettifyException(Exception e) {
>     return e.getClass().getSimpleName() + ": " +
>         e.getLocalizedMessage().split("\n")[0];
>   } {code}
> tries to print out the Exception the user experiences NPE
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
>   at 
> org.apache.hadoop.crypto.key.KeyShell.prettifyException(KeyShell.java:541)
>   at 
> org.apache.hadoop.crypto.key.KeyShell.printException(KeyShell.java:536)
>   at org.apache.hadoop.tools.CommandShell.run(CommandShell.java:79)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
>   at org.apache.hadoop.crypto.key.KeyShell.main(KeyShell.java:553) 
> {noformat}
> This is an unwanted behaviour because the user does not have any feedback 
> what and where went wrong.
>  
> My suggestion is to add *null checking* into the affected *prettifyException* 
> method.
> I'll create the Github PR soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19050) Add S3 Access Grants Support in S3A

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19050.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Fixed in trunk; backport to 3.4 should go in later.


> Add S3 Access Grants Support in S3A
> ---
>
> Key: HADOOP-19050
> URL: https://issues.apache.org/jira/browse/HADOOP-19050
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Jason Han
>Assignee: Jason Han
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Add support for S3 Access Grants 
> (https://aws.amazon.com/s3/features/access-grants/) in S3A.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19119) spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19119:

Priority: Minor  (was: Major)

> spotbugs complaining about possible NPE in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()
> 
>
> Key: HADOOP-19119
> URL: https://issues.apache.org/jira/browse/HADOOP-19119
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: crypto
>Affects Versions: 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> PRs against hadoop-common are reporting spotbugs problems
> {code}
> Dodgy code Warnings
> Code  Warning
> NPPossible null pointer dereference in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return 
> value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details)
> In class org.apache.hadoop.crypto.key.kms.ValueQueue
> In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String)
> Local variable stored in JVM register ?
> Dereferenced at ValueQueue.java:[line 332]
> Known null at ValueQueue.java:[line 332]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19119) spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19119.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> spotbugs complaining about possible NPE in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()
> 
>
> Key: HADOOP-19119
> URL: https://issues.apache.org/jira/browse/HADOOP-19119
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: crypto
>Affects Versions: 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> PRs against hadoop-common are reporting spotbugs problems
> {code}
> Dodgy code Warnings
> Code  Warning
> NPPossible null pointer dereference in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return 
> value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details)
> In class org.apache.hadoop.crypto.key.kms.ValueQueue
> In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String)
> Local variable stored in JVM register ?
> Dereferenced at ValueQueue.java:[line 332]
> Known null at ValueQueue.java:[line 332]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19114) upgrade to commons-compress 1.26.1 due to cves

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19114:

Component/s: build
 CVE

> upgrade to commons-compress 1.26.1 due to cves
> --
>
> Key: HADOOP-19114
> URL: https://issues.apache.org/jira/browse/HADOOP-19114
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, CVE
>Affects Versions: 3.4.0
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> 2 recent CVEs fixed - 
> https://mvnrepository.com/artifact/org.apache.commons/commons-compress



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19114) upgrade to commons-compress 1.26.1 due to cves

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19114:

Affects Version/s: 3.4.0

> upgrade to commons-compress 1.26.1 due to cves
> --
>
> Key: HADOOP-19114
> URL: https://issues.apache.org/jira/browse/HADOOP-19114
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> 2 recent CVEs fixed - 
> https://mvnrepository.com/artifact/org.apache.commons/commons-compress



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19115) upgrade to nimbus-jose-jwt 9.37.2 due to CVE

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19115:

Component/s: build
 CVE

> upgrade to nimbus-jose-jwt 9.37.2 due to CVE
> 
>
> Key: HADOOP-19115
> URL: https://issues.apache.org/jira/browse/HADOOP-19115
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, CVE
>Affects Versions: 3.4.0, 3.5.0
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/advisories/GHSA-gvpg-vgmx-xg6w



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19115) upgrade to nimbus-jose-jwt 9.37.2 due to CVE

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19115:

Affects Version/s: 3.4.0
   3.5.0

> upgrade to nimbus-jose-jwt 9.37.2 due to CVE
> 
>
> Key: HADOOP-19115
> URL: https://issues.apache.org/jira/browse/HADOOP-19115
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.5.0
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/advisories/GHSA-gvpg-vgmx-xg6w



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19116) update to zookeeper client 3.8.4 due to CVE

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19116:

Affects Version/s: 3.3.6
   3.4.0

> update to zookeeper client 3.8.4 due to CVE
> ---
>
> Key: HADOOP-19116
> URL: https://issues.apache.org/jira/browse/HADOOP-19116
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.6
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/advisories/GHSA-r978-9m6m-6gm6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19116) update to zookeeper client 3.8.4 due to CVE

2024-03-19 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828287#comment-17828287
 ] 

Steve Loughran commented on HADOOP-19116:
-

I've just created a new component "CVE" which we can use for CVE stuff; makes 
it easier to get reports

> update to zookeeper client 3.8.4 due to CVE
> ---
>
> Key: HADOOP-19116
> URL: https://issues.apache.org/jira/browse/HADOOP-19116
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: CVE
>Affects Versions: 3.4.0, 3.3.6
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/advisories/GHSA-r978-9m6m-6gm6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19116) update to zookeeper client 3.8.4 due to CVE

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19116:

Component/s: CVE

> update to zookeeper client 3.8.4 due to CVE
> ---
>
> Key: HADOOP-19116
> URL: https://issues.apache.org/jira/browse/HADOOP-19116
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: CVE
>Affects Versions: 3.4.0, 3.3.6
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/advisories/GHSA-r978-9m6m-6gm6



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19119) spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19119:

Summary: spotbugs complaining about possible NPE in 
org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()  (was: spotbugs 
complaining about possible NPE in 
org.apache.hadoop.crypto.key.kms.ValueQueue.alueQueue.getSize())

> spotbugs complaining about possible NPE in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize()
> 
>
> Key: HADOOP-19119
> URL: https://issues.apache.org/jira/browse/HADOOP-19119
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: crypto
>Affects Versions: 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> PRs against hadoop-common are reporting spotbugs problems
> {code}
> Dodgy code Warnings
> Code  Warning
> NPPossible null pointer dereference in 
> org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return 
> value of called method
> Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details)
> In class org.apache.hadoop.crypto.key.kms.ValueQueue
> In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String)
> Local variable stored in JVM register ?
> Dereferenced at ValueQueue.java:[line 332]
> Known null at ValueQueue.java:[line 332]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19119) spotbugs complaining about possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.alueQueue.getSize()

2024-03-19 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19119:
---

 Summary: spotbugs complaining about possible NPE in 
org.apache.hadoop.crypto.key.kms.ValueQueue.alueQueue.getSize()
 Key: HADOOP-19119
 URL: https://issues.apache.org/jira/browse/HADOOP-19119
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: crypto
Affects Versions: 3.5.0
Reporter: Steve Loughran
Assignee: Steve Loughran


PRs against hadoop-common are reporting spotbugs problems

{code}
Dodgy code Warnings
CodeWarning
NP  Possible null pointer dereference in 
org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value 
of called method
Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details)
In class org.apache.hadoop.crypto.key.kms.ValueQueue
In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String)
Local variable stored in JVM register ?
Dereferenced at ValueQueue.java:[line 332]
Known null at ValueQueue.java:[line 332]

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19100) Fix Spotbugs warnings in the build

2024-03-19 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828280#comment-17828280
 ] 

Steve Loughran commented on HADOOP-19100:
-

PRs against hadoop-common are reporting this; let me do a trivial fix which is 
spotbugs missing the previous .get() cal


{code}
Dodgy code Warnings
CodeWarning
NP  Possible null pointer dereference in 
org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String) due to return value 
of called method
Bug type NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (click for details)
In class org.apache.hadoop.crypto.key.kms.ValueQueue
In method org.apache.hadoop.crypto.key.kms.ValueQueue.getSize(String)
Local variable stored in JVM register ?
Dereferenced at ValueQueue.java:[line 332]
Known null at ValueQueue.java:[line 332]

{code}


> Fix Spotbugs warnings in the build
> --
>
> Key: HADOOP-19100
> URL: https://issues.apache.org/jira/browse/HADOOP-19100
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Priority: Major
>
> We are getting spotbugs warnings in every PR.
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/artifact/out/branch-spotbugs-hadoop-common-project-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-httpfs-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/artifact/out/branch-spotbugs-hadoop-yarn-project_hadoop-yarn-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-rbf-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/artifact/out/branch-spotbugs-hadoop-hdfs-project-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/artifact/out/branch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/artifact/out/branch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/artifact/out/branch-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1517/artifact/out/branch-spotbugs-hadoop-yarn-project-warnings.html]
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1517/artifact/out/branch-spotbugs-root-warnings.html]
>  
> Source: 
> https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1532/console



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19085) Compatibility Benchmark over HCFS Implementations

2024-03-19 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828279#comment-17828279
 ] 

Steve Loughran commented on HADOOP-19085:
-

One thing I'd like to say is: what are the compatibility reports so far?

> Compatibility Benchmark over HCFS Implementations
> -
>
> Key: HADOOP-19085
> URL: https://issues.apache.org/jira/browse/HADOOP-19085
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, test
>Affects Versions: 3.4.0
>Reporter: Han Liu
>Assignee: Han Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: HADOOP-19085.001.patch, HDFS Compatibility Benchmark 
> Design.pdf
>
>
> {*}Background：{*}Hadoop-Compatible File System (HCFS) is a core conception in 
> big data storage ecosystem, providing unified interfaces and generally clear 
> semantics, and has become the de-factor standard for industry storage systems 
> to follow and conform with. There have been a series of HCFS implementations 
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
> Storage, and more from storage service's providers on their own.
> {*}Problems：{*}However, as indicated by introduction.md, there is no formal 
> suite to do compatibility assessment of a file system for all such HCFS 
> implementations. Thus, whether the functionality is well accomplished and 
> meets the core compatible expectations mainly relies on service provider's 
> own report. Meanwhile, Hadoop is also developing and new features are 
> continuously contributing to HCFS interfaces for existing implementations to 
> follow and update, in which case, Hadoop also needs a tool to quickly assess 
> if these features are supported or not for a specific HCFS implementation. 
> Besides, the known hadoop command line tool or hdfs shell is used to directly 
> interact with a HCFS storage system, where most commands correspond to 
> specific HCFS interfaces and work well. Still, there are cases that are 
> complicated and may not work, like expunge command. To check such commands 
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal：{*}Accordingly, we propose to define a formal HCFS compatibility 
> benchmark and provide corresponding tool to do the compatibility assessment 
> for an HCFS storage system. The benchmark and tool should consider both HCFS 
> interfaces and hdfs shell commands. Different scenarios require different 
> kinds of compatibilities. For such consideration, we could define different 
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage 
> providers and storage users. For end users, it can be used to evalute the 
> compatibility level and determine if the storage system in question is 
> suitable for the required scenarios. For storage providers, it helps to 
> quickly generate an objective and reliable report about core functioins of 
> the storage service. As an instance, if the HCFS got a 100% on a suite named 
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have 
> been well achieved. It is also a guide indicating how storage service 
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19085) Compatibility Benchmark over HCFS Implementations

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19085:

Component/s: fs
 test

> Compatibility Benchmark over HCFS Implementations
> -
>
> Key: HADOOP-19085
> URL: https://issues.apache.org/jira/browse/HADOOP-19085
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, test
>Affects Versions: 3.4.0
>Reporter: Han Liu
>Assignee: Han Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: HADOOP-19085.001.patch, HDFS Compatibility Benchmark 
> Design.pdf
>
>
> {*}Background：{*}Hadoop-Compatible File System (HCFS) is a core conception in 
> big data storage ecosystem, providing unified interfaces and generally clear 
> semantics, and has become the de-factor standard for industry storage systems 
> to follow and conform with. There have been a series of HCFS implementations 
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
> Storage, and more from storage service's providers on their own.
> {*}Problems：{*}However, as indicated by introduction.md, there is no formal 
> suite to do compatibility assessment of a file system for all such HCFS 
> implementations. Thus, whether the functionality is well accomplished and 
> meets the core compatible expectations mainly relies on service provider's 
> own report. Meanwhile, Hadoop is also developing and new features are 
> continuously contributing to HCFS interfaces for existing implementations to 
> follow and update, in which case, Hadoop also needs a tool to quickly assess 
> if these features are supported or not for a specific HCFS implementation. 
> Besides, the known hadoop command line tool or hdfs shell is used to directly 
> interact with a HCFS storage system, where most commands correspond to 
> specific HCFS interfaces and work well. Still, there are cases that are 
> complicated and may not work, like expunge command. To check such commands 
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal：{*}Accordingly, we propose to define a formal HCFS compatibility 
> benchmark and provide corresponding tool to do the compatibility assessment 
> for an HCFS storage system. The benchmark and tool should consider both HCFS 
> interfaces and hdfs shell commands. Different scenarios require different 
> kinds of compatibilities. For such consideration, we could define different 
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage 
> providers and storage users. For end users, it can be used to evalute the 
> compatibility level and determine if the storage system in question is 
> suitable for the required scenarios. For storage providers, it helps to 
> quickly generate an objective and reliable report about core functioins of 
> the storage service. As an instance, if the HCFS got a 100% on a suite named 
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have 
> been well achieved. It is also a guide indicating how storage service 
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19085) Compatibility Benchmark over HCFS Implementations

2024-03-19 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828278#comment-17828278
 ] 

Steve Loughran commented on HADOOP-19085:
-

bq. The patch was just committed

I see that. this JIRA should be closed as fixed, the new work moved split out 
as toplevel or under a new uber-jira covering the work

> Compatibility Benchmark over HCFS Implementations
> -
>
> Key: HADOOP-19085
> URL: https://issues.apache.org/jira/browse/HADOOP-19085
> Project: Hadoop Common
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: Han Liu
>Assignee: Han Liu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-19085.001.patch, HDFS Compatibility Benchmark 
> Design.pdf
>
>
> {*}Background：{*}Hadoop-Compatible File System (HCFS) is a core conception in 
> big data storage ecosystem, providing unified interfaces and generally clear 
> semantics, and has become the de-factor standard for industry storage systems 
> to follow and conform with. There have been a series of HCFS implementations 
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
> Storage, and more from storage service's providers on their own.
> {*}Problems：{*}However, as indicated by introduction.md, there is no formal 
> suite to do compatibility assessment of a file system for all such HCFS 
> implementations. Thus, whether the functionality is well accomplished and 
> meets the core compatible expectations mainly relies on service provider's 
> own report. Meanwhile, Hadoop is also developing and new features are 
> continuously contributing to HCFS interfaces for existing implementations to 
> follow and update, in which case, Hadoop also needs a tool to quickly assess 
> if these features are supported or not for a specific HCFS implementation. 
> Besides, the known hadoop command line tool or hdfs shell is used to directly 
> interact with a HCFS storage system, where most commands correspond to 
> specific HCFS interfaces and work well. Still, there are cases that are 
> complicated and may not work, like expunge command. To check such commands 
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal：{*}Accordingly, we propose to define a formal HCFS compatibility 
> benchmark and provide corresponding tool to do the compatibility assessment 
> for an HCFS storage system. The benchmark and tool should consider both HCFS 
> interfaces and hdfs shell commands. Different scenarios require different 
> kinds of compatibilities. For such consideration, we could define different 
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage 
> providers and storage users. For end users, it can be used to evalute the 
> compatibility level and determine if the storage system in question is 
> suitable for the required scenarios. For storage providers, it helps to 
> quickly generate an objective and reliable report about core functioins of 
> the storage service. As an instance, if the HCFS got a 100% on a suite named 
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have 
> been well achieved. It is also a guide indicating how storage service 
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19085) Compatibility Benchmark over HCFS Implementations

2024-03-19 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19085:

Fix Version/s: 3.5.0

> Compatibility Benchmark over HCFS Implementations
> -
>
> Key: HADOOP-19085
> URL: https://issues.apache.org/jira/browse/HADOOP-19085
> Project: Hadoop Common
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: Han Liu
>Assignee: Han Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: HADOOP-19085.001.patch, HDFS Compatibility Benchmark 
> Design.pdf
>
>
> {*}Background：{*}Hadoop-Compatible File System (HCFS) is a core conception in 
> big data storage ecosystem, providing unified interfaces and generally clear 
> semantics, and has become the de-factor standard for industry storage systems 
> to follow and conform with. There have been a series of HCFS implementations 
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
> Storage, and more from storage service's providers on their own.
> {*}Problems：{*}However, as indicated by introduction.md, there is no formal 
> suite to do compatibility assessment of a file system for all such HCFS 
> implementations. Thus, whether the functionality is well accomplished and 
> meets the core compatible expectations mainly relies on service provider's 
> own report. Meanwhile, Hadoop is also developing and new features are 
> continuously contributing to HCFS interfaces for existing implementations to 
> follow and update, in which case, Hadoop also needs a tool to quickly assess 
> if these features are supported or not for a specific HCFS implementation. 
> Besides, the known hadoop command line tool or hdfs shell is used to directly 
> interact with a HCFS storage system, where most commands correspond to 
> specific HCFS interfaces and work well. Still, there are cases that are 
> complicated and may not work, like expunge command. To check such commands 
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal：{*}Accordingly, we propose to define a formal HCFS compatibility 
> benchmark and provide corresponding tool to do the compatibility assessment 
> for an HCFS storage system. The benchmark and tool should consider both HCFS 
> interfaces and hdfs shell commands. Different scenarios require different 
> kinds of compatibilities. For such consideration, we could define different 
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage 
> providers and storage users. For end users, it can be used to evalute the 
> compatibility level and determine if the storage system in question is 
> suitable for the required scenarios. For storage providers, it helps to 
> quickly generate an objective and reliable report about core functioins of 
> the storage service. As an instance, if the HCFS got a 100% on a suite named 
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have 
> been well achieved. It is also a guide indicating how storage service 
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19085) Compatibility Benchmark over HCFS Implementations

2024-03-18 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19085:

Target Version/s:   (was: 3.3.6)

> Compatibility Benchmark over HCFS Implementations
> -
>
> Key: HADOOP-19085
> URL: https://issues.apache.org/jira/browse/HADOOP-19085
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Han Liu
>Assignee: Han Liu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-19085.001.patch, HDFS Compatibility Benchmark 
> Design.pdf
>
>
> {*}Background：{*}Hadoop-Compatible File System (HCFS) is a core conception in 
> big data storage ecosystem, providing unified interfaces and generally clear 
> semantics, and has become the de-factor standard for industry storage systems 
> to follow and conform with. There have been a series of HCFS implementations 
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
> Storage, and more from storage service's providers on their own.
> {*}Problems：{*}However, as indicated by introduction.md, there is no formal 
> suite to do compatibility assessment of a file system for all such HCFS 
> implementations. Thus, whether the functionality is well accomplished and 
> meets the core compatible expectations mainly relies on service provider's 
> own report. Meanwhile, Hadoop is also developing and new features are 
> continuously contributing to HCFS interfaces for existing implementations to 
> follow and update, in which case, Hadoop also needs a tool to quickly assess 
> if these features are supported or not for a specific HCFS implementation. 
> Besides, the known hadoop command line tool or hdfs shell is used to directly 
> interact with a HCFS storage system, where most commands correspond to 
> specific HCFS interfaces and work well. Still, there are cases that are 
> complicated and may not work, like expunge command. To check such commands 
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal：{*}Accordingly, we propose to define a formal HCFS compatibility 
> benchmark and provide corresponding tool to do the compatibility assessment 
> for an HCFS storage system. The benchmark and tool should consider both HCFS 
> interfaces and hdfs shell commands. Different scenarios require different 
> kinds of compatibilities. For such consideration, we could define different 
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage 
> providers and storage users. For end users, it can be used to evalute the 
> compatibility level and determine if the storage system in question is 
> suitable for the required scenarios. For storage providers, it helps to 
> quickly generate an objective and reliable report about core functioins of 
> the storage service. As an instance, if the HCFS got a 100% on a suite named 
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have 
> been well achieved. It is also a guide indicating how storage service 
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19085) Compatibility Benchmark over HCFS Implementations

2024-03-18 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19085:

Affects Version/s: 3.4.0

> Compatibility Benchmark over HCFS Implementations
> -
>
> Key: HADOOP-19085
> URL: https://issues.apache.org/jira/browse/HADOOP-19085
> Project: Hadoop Common
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: Han Liu
>Assignee: Han Liu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HADOOP-19085.001.patch, HDFS Compatibility Benchmark 
> Design.pdf
>
>
> {*}Background：{*}Hadoop-Compatible File System (HCFS) is a core conception in 
> big data storage ecosystem, providing unified interfaces and generally clear 
> semantics, and has become the de-factor standard for industry storage systems 
> to follow and conform with. There have been a series of HCFS implementations 
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
> Storage, and more from storage service's providers on their own.
> {*}Problems：{*}However, as indicated by introduction.md, there is no formal 
> suite to do compatibility assessment of a file system for all such HCFS 
> implementations. Thus, whether the functionality is well accomplished and 
> meets the core compatible expectations mainly relies on service provider's 
> own report. Meanwhile, Hadoop is also developing and new features are 
> continuously contributing to HCFS interfaces for existing implementations to 
> follow and update, in which case, Hadoop also needs a tool to quickly assess 
> if these features are supported or not for a specific HCFS implementation. 
> Besides, the known hadoop command line tool or hdfs shell is used to directly 
> interact with a HCFS storage system, where most commands correspond to 
> specific HCFS interfaces and work well. Still, there are cases that are 
> complicated and may not work, like expunge command. To check such commands 
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal：{*}Accordingly, we propose to define a formal HCFS compatibility 
> benchmark and provide corresponding tool to do the compatibility assessment 
> for an HCFS storage system. The benchmark and tool should consider both HCFS 
> interfaces and hdfs shell commands. Different scenarios require different 
> kinds of compatibilities. For such consideration, we could define different 
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage 
> providers and storage users. For end users, it can be used to evalute the 
> compatibility level and determine if the storage system in question is 
> suitable for the required scenarios. For storage providers, it helps to 
> quickly generate an objective and reliable report about core functioins of 
> the storage service. As an instance, if the HCFS got a 100% on a suite named 
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have 
> been well achieved. It is also a guide indicating how storage service 
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19083) provide hadoop binary tarball without aws v2 sdk

2024-03-15 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827526#comment-17827526
 ] 

Steve Loughran commented on HADOOP-19083:
-

wondering if actually the way to do this is in the release module with a 
special target there to create a lean tarball the same way we do for arm64; 
take the x86 one and modify it.

That way all jar files will be consistent.

the -DskipShade option is useful in day to day builds as it saves many minutes; 
including the aws sdk into a build just triggers the copy of a single (large!) 
file

> provide hadoop binary tarball without aws v2 sdk
> 
>
> Key: HADOOP-19083
> URL: https://issues.apache.org/jira/browse/HADOOP-19083
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>
> Have the default hadoop binary .tar.gz exclude the aws v2 sdk by default. 
> This SDK brings the total size of the distribution to about 1 GB.
> Proposed
> * add a profile to include the aws sdk in the dist module
> * disable it by default
> Instead we document which version is needed. 
> The hadoop-aws and hadoop-cloud storage maven artifacts will declare their 
> dependencies, so apps building with those get to do the download.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19108) S3 Express: document use

2024-03-14 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17827063#comment-17827063
 ] 

Steve Loughran commented on HADOOP-19108:
-

yeah. got some internal docs which I can use as a start

> S3 Express: document use
> 
>
> Key: HADOOP-19108
> URL: https://issues.apache.org/jira/browse/HADOOP-19108
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Major
>
> The 3.4.0 release doesn't explicitly cover S3 Express.
> It's support is automatic
> * library handles it
> * hadoop shell commands know that there may be "missing" dirs in treewalks 
> due to in-flight uploads
> * s3afs automatically switches to deleting pending uploads in delete(dir) 
> call.
> we just need to provide a summary of features, how to probe etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19066) AWS SDK V2 - Enabling FIPS should be allowed with central endpoint

2024-03-13 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19066:

Description: 
FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK 
considers overriding endpoint and enabling fips as mutually exclusive, we fail 
fast if fs.s3a.endpoint is set with fips support (details on HADOOP-18975).

Now, we no longer override SDK endpoint for central endpoint since we enable 
cross region access (details on HADOOP-19044) but we would still fail fast if 
endpoint is central and fips is enabled.

Changes proposed:
 * S3A to fail fast only if FIPS is enabled and non-central endpoint is 
configured.
 * Tests to ensure S3 bucket is accessible with default region us-east-2 with 
cross region access (expected with central endpoint).
 * Document FIPS support with central endpoint on connecting.html.

h3. Note: there are two patches here on trunk; they've been coalesced into one 
on branch-3.4. 

  was:
FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK 
considers overriding endpoint and enabling fips as mutually exclusive, we fail 
fast if fs.s3a.endpoint is set with fips support (details on HADOOP-18975).

Now, we no longer override SDK endpoint for central endpoint since we enable 
cross region access (details on HADOOP-19044) but we would still fail fast if 
endpoint is central and fips is enabled.

Changes proposed:
 * S3A to fail fast only if FIPS is enabled and non-central endpoint is 
configured.
 * Tests to ensure S3 bucket is accessible with default region us-east-2 with 
cross region access (expected with central endpoint).
 * Document FIPS support with central endpoint on connecting.html.


> AWS SDK V2 - Enabling FIPS should be allowed with central endpoint
> --
>
> Key: HADOOP-19066
> URL: https://issues.apache.org/jira/browse/HADOOP-19066
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK 
> considers overriding endpoint and enabling fips as mutually exclusive, we 
> fail fast if fs.s3a.endpoint is set with fips support (details on 
> HADOOP-18975).
> Now, we no longer override SDK endpoint for central endpoint since we enable 
> cross region access (details on HADOOP-19044) but we would still fail fast if 
> endpoint is central and fips is enabled.
> Changes proposed:
>  * S3A to fail fast only if FIPS is enabled and non-central endpoint is 
> configured.
>  * Tests to ensure S3 bucket is accessible with default region us-east-2 with 
> cross region access (expected with central endpoint).
>  * Document FIPS support with central endpoint on connecting.html.
> h3. Note: there are two patches here on trunk; they've been coalesced into 
> one on branch-3.4. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19066) AWS SDK V2 - Enabling FIPS should be allowed with central endpoint

2024-03-13 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19066.
-
Fix Version/s: 3.4.1
   Resolution: Fixed

> AWS SDK V2 - Enabling FIPS should be allowed with central endpoint
> --
>
> Key: HADOOP-19066
> URL: https://issues.apache.org/jira/browse/HADOOP-19066
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK 
> considers overriding endpoint and enabling fips as mutually exclusive, we 
> fail fast if fs.s3a.endpoint is set with fips support (details on 
> HADOOP-18975).
> Now, we no longer override SDK endpoint for central endpoint since we enable 
> cross region access (details on HADOOP-19044) but we would still fail fast if 
> endpoint is central and fips is enabled.
> Changes proposed:
>  * S3A to fail fast only if FIPS is enabled and non-central endpoint is 
> configured.
>  * Tests to ensure S3 bucket is accessible with default region us-east-2 with 
> cross region access (expected with central endpoint).
>  * Document FIPS support with central endpoint on connecting.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Assigned] (HADOOP-19088) upgrade to jersey-json 1.22.0

2024-03-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-19088:
---

Assignee: PJ Fanning

> upgrade to jersey-json 1.22.0
> -
>
> Key: HADOOP-19088
> URL: https://issues.apache.org/jira/browse/HADOOP-19088
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.3.6
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Tidies up support for Jettison and Jackson versions used by Hadoop



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19088) upgrade to jersey-json 1.22.0

2024-03-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19088:

Fix Version/s: 3.5.0

> upgrade to jersey-json 1.22.0
> -
>
> Key: HADOOP-19088
> URL: https://issues.apache.org/jira/browse/HADOOP-19088
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.3.6
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Tidies up support for Jettison and Jackson versions used by Hadoop



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

2024-03-12 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825790#comment-17825790
 ] 

Steve Loughran commented on HADOOP-19102:
-

what is the exception when its wrong and is there a way to disable it? this can 
go into the JIRA text for people who encounter it on 3.4.0

> [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
> --
>
> Key: HADOOP-19102
> URL: https://issues.apache.org/jira/browse/HADOOP-19102
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>
> The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
> If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
> read more data than the buffer array can hold, which causes an exception.
> Change: To avoid this, we will keep footerBufferSize = 
> min(readBufferSizeConfig, footerBufferSizeConfig)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

2024-03-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19102:

Affects Version/s: 3.4.0

> [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
> --
>
> Key: HADOOP-19102
> URL: https://issues.apache.org/jira/browse/HADOOP-19102
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>
> The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
> If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
> read more data than the buffer array can hold, which causes an exception.
> Change: To avoid this, we will keep footerBufferSize = 
> min(readBufferSizeConfig, footerBufferSizeConfig)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

2024-03-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19102:

Fix Version/s: (was: 3.4.0)
   (was: 3.5.0)

> [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
> --
>
> Key: HADOOP-19102
> URL: https://issues.apache.org/jira/browse/HADOOP-19102
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>
> The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
> If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
> read more data than the buffer array can hold, which causes an exception.
> Change: To avoid this, we will keep footerBufferSize = 
> min(readBufferSizeConfig, footerBufferSizeConfig)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

2024-03-12 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825785#comment-17825785
 ] 

Steve Loughran commented on HADOOP-19102:
-

[~pranavsaxena] can I remind you to set the bus you want to get in into target 
version, not fix. that is only used for branches where the fix has been merged 
in; it is used for release note generation

> [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
> --
>
> Key: HADOOP-19102
> URL: https://issues.apache.org/jira/browse/HADOOP-19102
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>
> The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
> If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
> read more data than the buffer array can hold, which causes an exception.
> Change: To avoid this, we will keep footerBufferSize = 
> min(readBufferSizeConfig, footerBufferSizeConfig)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19066) AWS SDK V2 - Enabling FIPS should be allowed with central endpoint

2024-03-12 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825771#comment-17825771
 ] 

Steve Loughran commented on HADOOP-19066:
-

afraid things break for me with a test bucket set up for s3 london. full stack 
set below. I'm not going to revert, but we will need a followup...I won't 
cherrypick to branch-3.4 until then

{code}
[INFO] Running org.apache.hadoop.fs.s3a.ITestS3AEndpointRegion
[ERROR] Tests run: 18, Failures: 0, Errors: 1, Skipped: 1, Time elapsed: 56.26 
s <<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AEndpointRegion
[ERROR] 
testCentralEndpointAndNullRegionFipsWithCRUD(org.apache.hadoop.fs.s3a.ITestS3AEndpointRegion)
  Time elapsed: 4.821 s  <<< ERROR!
java.net.UnknownHostException: getFileStatus on 
s3a://stevel-london/test/testCentralEndpointAndNullRegionFipsWithCRUD/srcdir: 
software.amazon.awssdk.core.exception.SdkClientException: Received an 
UnknownHostException when attempting to interact with a service. See cause for 
the exact endpoint that is failing to resolve. If this is happening on an 
endpoint that previously worked, there may be a network connectivity issue or 
your DNS cache could be storing endpoints for too long.:
software.amazon.awssdk.core.exception.SdkClientException: Received an 
UnknownHostException when attempting to interact with a service. See cause for 
the exact endpoint that is failing to resolve. If this is happening on an 
endpoint that previously worked, there may be a network connectivity issue or 
your DNS cache could be storing endpoints for too long.: 
stevel-london.s3-fips.eu-west-2.amazonaws.com
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.fs.s3a.impl.ErrorTranslation.wrapWithInnerIOE(ErrorTranslation.java:182)
at 
org.apache.hadoop.fs.s3a.impl.ErrorTranslation.maybeExtractIOException(ErrorTranslation.java:152)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:207)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:155)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4066)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3922)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem$MkdirOperationCallbacksImpl.probePathStatus(S3AFileSystem.java:3794)
at 
org.apache.hadoop.fs.s3a.impl.MkdirOperation.probePathStatusOrNull(MkdirOperation.java:173)
at 
org.apache.hadoop.fs.s3a.impl.MkdirOperation.getPathStatusExpectingDir(MkdirOperation.java:194)
at 
org.apache.hadoop.fs.s3a.impl.MkdirOperation.execute(MkdirOperation.java:108)
at 
org.apache.hadoop.fs.s3a.impl.MkdirOperation.execute(MkdirOperation.java:57)
at 
org.apache.hadoop.fs.s3a.impl.ExecutingStoreOperation.apply(ExecutingStoreOperation.java:76)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2707)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2726)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.mkdirs(S3AFileSystem.java:3766)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2494)
at 
org.apache.hadoop.fs.s3a.ITestS3AEndpointRegion.assertOpsUsingNewFs(ITestS3AEndpointRegion.java:461)
at 
org.apache.hadoop.fs.s3a.ITestS3AEndpointRegion.testCentralEndpointAndNullRegionFipsWithCRUD(ITestS3AEndpointRegion.java:454)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at

[jira] [Reopened] (HADOOP-19066) AWS SDK V2 - Enabling FIPS should be allowed with central endpoint

2024-03-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-19066:
-

> AWS SDK V2 - Enabling FIPS should be allowed with central endpoint
> --
>
> Key: HADOOP-19066
> URL: https://issues.apache.org/jira/browse/HADOOP-19066
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK 
> considers overriding endpoint and enabling fips as mutually exclusive, we 
> fail fast if fs.s3a.endpoint is set with fips support (details on 
> HADOOP-18975).
> Now, we no longer override SDK endpoint for central endpoint since we enable 
> cross region access (details on HADOOP-19044) but we would still fail fast if 
> endpoint is central and fips is enabled.
> Changes proposed:
>  * S3A to fail fast only if FIPS is enabled and non-central endpoint is 
> configured.
>  * Tests to ensure S3 bucket is accessible with default region us-east-2 with 
> cross region access (expected with central endpoint).
>  * Document FIPS support with central endpoint on connecting.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19066) AWS SDK V2 - Enabling FIPS should be allowed with central endpoint

2024-03-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19066:

Fix Version/s: 3.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> AWS SDK V2 - Enabling FIPS should be allowed with central endpoint
> --
>
> Key: HADOOP-19066
> URL: https://issues.apache.org/jira/browse/HADOOP-19066
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> FIPS support can be enabled by setting "fs.s3a.endpoint.fips". Since the SDK 
> considers overriding endpoint and enabling fips as mutually exclusive, we 
> fail fast if fs.s3a.endpoint is set with fips support (details on 
> HADOOP-18975).
> Now, we no longer override SDK endpoint for central endpoint since we enable 
> cross region access (details on HADOOP-19044) but we would still fail fast if 
> endpoint is central and fips is enabled.
> Changes proposed:
>  * S3A to fail fast only if FIPS is enabled and non-central endpoint is 
> configured.
>  * Tests to ensure S3 bucket is accessible with default region us-east-2 with 
> cross region access (expected with central endpoint).
>  * Document FIPS support with central endpoint on connecting.html.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19108) S3 Express: document use

2024-03-12 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19108:
---

 Summary: S3 Express: document use
 Key: HADOOP-19108
 URL: https://issues.apache.org/jira/browse/HADOOP-19108
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


The 3.4.0 release doesn't explicitly cover S3 Express.

It's support is automatic
* library handles it
* hadoop shell commands know that there may be "missing" dirs in treewalks due 
to in-flight uploads
* s3afs automatically switches to deleting pending uploads in delete(dir) call.

we just need to provide a summary of features, how to probe etc.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-18996) S3A to provide full support for S3 Express One Zone

2024-03-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-18996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18996:

Description: 
HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express 
one zone support. 

Complete support needs to be added to address tests that fail with s3 express 
one zone, additional tests, documentation etc. 

* hadoop-common path capability to indicate that treewalking may encounter 
missing dirs
* use this in treewalking code in shell, mapreduce FileInputFormat etc to not 
fail during treewalks
* extra path capability for s3express too.
* tests for this
* anything else

A filesystem can now be probed for inconsistent directory listings through 
{{fs.hasPathCapability(path, "fs.capability.directory.listing.inconsistent")}}

If true, then treewalking code SHOULD NOT report a failure if, when walking 
into a subdirectory, a list/getFileStatus on that directory raises a 
FileNotFoundExceptin.


  was:
HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express 
one zone support. 

Complete support needs to be added to address tests that fail with s3 express 
one zone, additional tests, documentation etc. 

* hadoop-common path capability to indicate that treewalking may encounter 
missing dirs
* use this in treewalking code in shell, mapreduce FileInputFormat etc to not 
fail during treewalks
* extra path capability for s3express too.
* tests for this
* anything else

A filesystem can now be probed for inconsistent directoriy listings through 
{{fs.hasPathCapability(path, "fs.capability.directory.listing.inconsistent")}}

If true, then treewalking code SHOULD NOT report a failure if, when walking 
into a subdirectory, a list/getFileStatus on that directory raises a 
FileNotFoundExceptin.



> S3A to provide full support for S3 Express One Zone
> ---
>
> Key: HADOOP-18996
> URL: https://issues.apache.org/jira/browse/HADOOP-18996
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Ahmar Suhail
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.7-aws
>
>
> HADOOP-18995 upgrades the SDK version which allows connecting to a s3 express 
> one zone support. 
> Complete support needs to be added to address tests that fail with s3 express 
> one zone, additional tests, documentation etc. 
> * hadoop-common path capability to indicate that treewalking may encounter 
> missing dirs
> * use this in treewalking code in shell, mapreduce FileInputFormat etc to not 
> fail during treewalks
> * extra path capability for s3express too.
> * tests for this
> * anything else
> A filesystem can now be probed for inconsistent directory listings through 
> {{fs.hasPathCapability(path, "fs.capability.directory.listing.inconsistent")}}
> If true, then treewalking code SHOULD NOT report a failure if, when walking 
> into a subdirectory, a list/getFileStatus on that directory raises a 
> FileNotFoundExceptin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18854) add options to disable range merging of vectored io

2024-03-12 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-18854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825659#comment-17825659
 ] 

Steve Loughran commented on HADOOP-18854:
-

thanks. will close as done.

> add options to disable range merging of vectored io
> ---
>
> Key: HADOOP-18854
> URL: https://issues.apache.org/jira/browse/HADOOP-18854
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.3.5, 3.3.6
>Reporter: Steve Loughran
>Priority: Major
>
> I'm seeing test failures in my PARQUET-2171 pr because assertions about the 
> #of bytes read isn't holding -small files are being read and the vector range 
> merging is pulling in the whole file.
> ```
> [ERROR]   TestInputOutputFormat.testReadWriteWithCounter:338 bytestotal != 
> bytesread expected:<5510> but was:<11020>
> ```
> I think for parquet i will add an option to disable vector io, but really the 
> filesystems which support it should allow for merging to be disabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18855) VectorIO API tuning/stabilization

2024-03-11 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-18855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825295#comment-17825295
 ] 

Steve Loughran commented on HADOOP-18855:
-

(parquet api will pass down its allocator in preparation for this)

> VectorIO API tuning/stabilization
> -
>
> Key: HADOOP-18855
> URL: https://issues.apache.org/jira/browse/HADOOP-18855
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/s3
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> what do do we need to do to improve the vector iO experience based on 
> integration and use.
> obviously, we cannot change anything incompatibly, but we may find bugs to 
> fix and other possible enhancements



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-18855) VectorIO API tuning/stabilization

2024-03-11 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-18855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825254#comment-17825254
 ] 

Steve Loughran commented on HADOOP-18855:
-

I'm thinking of adding a way to pass down the byte byffer release method 
alongside the allocator (overloaded methods...) so that on read failures s3a  
can 

*  only allocate buffers when submit() returns a range. that is if we think it 
is better to do this, a
* release the buffers on failure -which doesn't happen today
* line us up for doing retries better, especially on combined reads, where we 
need to continue from where we left off, without rereading from the start. it 
may be better here to just split into individual ranges for retrying.

> VectorIO API tuning/stabilization
> -
>
> Key: HADOOP-18855
> URL: https://issues.apache.org/jira/browse/HADOOP-18855
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/s3
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> what do do we need to do to improve the vector iO experience based on 
> integration and use.
> obviously, we cannot change anything incompatibly, but we may find bugs to 
> fix and other possible enhancements



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-19105) S3A: Recover from Vector IO read failures

2024-03-08 Thread Steve Loughran (Jira)

Steve Loughran created HADOOP-19105:
---

 Summary: S3A: Recover from Vector IO read failures
 Key: HADOOP-19105
 URL: https://issues.apache.org/jira/browse/HADOOP-19105
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.6, 3.4.0
 Environment: s3a vector IO doesn't try to recover from read failures 
the way read() does.

Need to
* abort HTTP stream if considered needed
* retry active read which failed
* but not those which had succeeded


Reporter: Steve Loughran






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-19101) Vectored Read into off-heap buffer broken in fallback implementation

2024-03-08 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824782#comment-17824782
 ] 

Steve Loughran commented on HADOOP-19101:
-

* Hive Tez is also safe
* Hive LLAP is exposed as it reads into off-heap buffers

> Vectored Read into off-heap buffer broken in fallback implementation
> 
>
> Key: HADOOP-19101
> URL: https://issues.apache.org/jira/browse/HADOOP-19101
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
>
> {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at 
> position zero even when the range is at a different offset. As a result: you 
> can get incorrect information.
> Thanks for this is straightforward: we pass in a FileRange and use its offset 
> as the starting position.
> However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely 
> read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we 
> have never seen this in production because the parquet and ORC libraries both 
> read into on-heap storage.
> Those libraries needs to be audited to make sure that they never attempt to 
> read into off-heap DirectBuffers. This is a bit trickier than you would think 
> because an allocator is passed in. For PARQUET-2171 we will 
> * only invoke the API on streams which explicitly declare their support for 
> the API (so fallback in parquet itself)
> * not invoke when direct buffer allocation is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Resolved] (HADOOP-19043) S3A: Regression: ITestS3AOpenCost fails on prefetch test runs

2024-03-08 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19043.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> S3A: Regression: ITestS3AOpenCost fails on prefetch test runs
> -
>
> Key: HADOOP-19043
> URL: https://issues.apache.org/jira/browse/HADOOP-19043
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Getting test failures in the new ITestS3AOpenCost tests when run with 
> {{-Dprefetch}}
> Thought I'd tested this, but clearly not
> * class cast failures on asserts (fix: skip)
> * bytes read different in one test: (fix: identify and address)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-19104) S3A HeaderProcessing to process all metadata entries of HEAD response

2024-03-07 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/HADOOP-19104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19104:

Description: 
S3A HeaderProcessing builds up an incomplete list of headers as its mapping of 
md to header. entries omits headers including
x-amz-server-side-encryption-aws-kms-key-id

proposed
* review all headers which are stripped from "raw" responses and mapped into 
headers
* make sure result of headers matches v1; looks like etags are different
* make sure x-amz-server-side-encryption-aws-kms-key-id gets back
* plus new checksum values


v1 sdk

{code}

# file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
header.Content-Length="524671"
header.Content-Type="binary/octet-stream"
header.ETag="3e39531220fbd3747d32cf93a79a7a0c"
header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
header.x-amz-server-side-encryption="AES256"

{code}

v2 SDK. note how etag is now double quoted.

{code}

# file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
header.Content-Length="524671"
header.Content-Type="binary/octet-stream"
header.ETag=""3e39531220fbd3747d32cf93a79a7a0c""
header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
header.x-amz-server-side-encryption="AES256"

{code}


  was:
S3A HeaderProcessing builds up an incomplete list of headers as its mapping of 
md to header. entries omits headers including
x-amz-server-side-encryption-aws-kms-key-id

proposed
* review all headers which are stripped from "raw" responses and mapped into 
headers
* make sure result of headers matches v1; looks like etags are different
* make sure x-amz-server-side-encryption-aws-kms-key-id gets back
* plus new checksum values

{code}
v1 sdk

{code}

# file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
header.Content-Length="524671"
header.Content-Type="binary/octet-stream"
header.ETag="3e39531220fbd3747d32cf93a79a7a0c"
header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
header.x-amz-server-side-encryption="AES256"

{code}

v2 SDK. note how etag is now double quoted.

{code}

# file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
header.Content-Length="524671"
header.Content-Type="binary/octet-stream"
header.ETag=""3e39531220fbd3747d32cf93a79a7a0c""
header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
header.x-amz-server-side-encryption="AES256"

{code}



> S3A HeaderProcessing to process all metadata entries of HEAD response
> -
>
> Key: HADOOP-19104
> URL: https://issues.apache.org/jira/browse/HADOOP-19104
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Major
>
> S3A HeaderProcessing builds up an incomplete list of headers as its mapping 
> of md to header. entries omits headers including
> x-amz-server-side-encryption-aws-kms-key-id
> proposed
> * review all headers which are stripped from "raw" responses and mapped into 
> headers
> * make sure result of headers matches v1; looks like etags are different
> * make sure x-amz-server-side-encryption-aws-kms-key-id gets back
> * plus new checksum values
> v1 sdk
> {code}
> # file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
> header.Content-Length="524671"
> header.Content-Type="binary/octet-stream"
> header.ETag="3e39531220fbd3747d32cf93a79a7a0c"
> header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
> header.x-amz-server-side-encryption="AES256"
> {code}
> v2 SDK. note how etag is now double quoted.
> {code}
> # file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
> header.Content-Length="524671"
> header.Content-Type="binary/octet-stream"
> header.ETag=""3e39531220fbd3747d32cf93a79a7a0c""
> header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
> header.x-amz-server-side-encryption="AES256"
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 18769 matches

Mail list logo