[jira] [Created] (HADOOP-18874) ABFS: Adding Server returned request id in Exception method thrown to caller.

2023-08-30 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-18874:
--

 Summary: ABFS: Adding Server returned request id in Exception 
method thrown to caller.
 Key: HADOOP-18874
 URL: https://issues.apache.org/jira/browse/HADOOP-18874
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Anuj Modi


Each request made to Azure server has its unique ActivityId (rid) which is 
returned in response of the request whether is succeed or fails.
When a HDFS call fails due to an error from Azure service, An 
ABFSRestOperationException is throws to the caller. This task is to add a 
server returned activity id (rid) in the exception message which can be used to 
investigate the failure on service side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport

2023-09-20 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-18910:
--

 Summary: ABFS: Adding Support for MD5 Hash based integrity 
verification of the request content during transport 
 Key: HADOOP-18910
 URL: https://issues.apache.org/jira/browse/HADOOP-18910
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Anuj Modi
Assignee: Anuj Modi


Azure Storage Supports Content-MD5 Request Headers in Both Read and Append APIs.
Read: [Path - Read - REST API (Azure Storage Services) | Microsoft 
Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read]
Append: [Path - Update - REST API (Azure Storage Services) | Microsoft 
Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update]

This change is to make client-side changes to support them. In Read request, we 
will send the appropriate header in response to which server will return the 
MD5 Hash of the data it sends back. On Client we will tally this with the MD5 
hash computed from the data received.

In Append request, we will compute the MD5 Hash of the data that we are sending 
to the server and specify that in appropriate header. Server on finding that 
header will tally this with the MD5 hash it will compute on the data received. 

This whole Checksum Validation Support is guarded behind a config, Config is by 
default disabled because with the use of "https" integrity of data is preserved 
anyways. This is introduced as an additional data integrity check which will 
have a performance impact as well.

Users can decide if they want to enable this or not by setting the following 
config to *"true"* or *"false"* respectively. *Config: 
"fs.azure.enable.checksum.validation"*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18869) ABFS: Fixing Behavior of a File System APIs on root path

2023-08-29 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-18869:
--

 Summary: ABFS: Fixing Behavior of a File System APIs on root path
 Key: HADOOP-18869
 URL: https://issues.apache.org/jira/browse/HADOOP-18869
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.6
Reporter: Anuj Modi
Assignee: Anuj Modi


Following HDFS Apis are failing when called on a root path.

{*}{*}{*}{*}{*}{*}
|FS Call|Status|Error thrown to caller|
|create()|Failing|Operation failed: "The request URI is invalid.", 400, PUT, 
https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-02076119-21ed-4ada-bcd0-14afaae54013/?resource=file=90,
 InvalidUri, "The request URI is invalid. 
RequestId:1d23f8c2-d01f-0059-61b6-c60c2400 
Time:2023-08-04T09:29:55.4813818Z"|
|createNonRecursive()|Failing|Runtime Exception: 
java.lang.IllegalArgumentException: null path (This is occuring because 
getParentPath is null and getFileStatus is called on null)|
|setXAttr()|Failing|Operation failed: "The request URI is invalid.", 400, HEAD, 
https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-491399b3-c3d0-4568-9d4a-a26e0aa8f000/?upn=false=90|
|getXAttr()|Failing|Operation failed: "The request URI is invalid.", 400, HEAD, 
https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-491399b3-c3d0-4568-9d4a-a26e0aa8f000/?upn=false=91|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18690) Cherry-Picking Commits to Disable purging list of in progress reads in abfs inputstream close

2023-04-03 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-18690:
--

 Summary: Cherry-Picking Commits to Disable purging list of in 
progress reads in abfs inputstream close
 Key: HADOOP-18690
 URL: https://issues.apache.org/jira/browse/HADOOP-18690
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Anuj Modi
Assignee: Anuj Modi
 Fix For: 3.3.5, 3.3.0


This task is to cherry pick the fixes to all the branches where the data 
corruption fix has gone in.

This is similar to [HADOOP-18546] disable purging list of in progress reads in 
abfs stream closed - ASF JIRA (apache.org) for other release brnaches

More details on the task and issue: [HADOOP-18521] ABFS ReadBufferManager 
buffer sharing across concurrent HTTP requests - ASF JIRA (apache.org)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18759) [ABFS}[Backoff-Optimization] Have a Linear retry policy for connection timeout failures

2023-06-02 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-18759:
--

 Summary: [ABFS}[Backoff-Optimization] Have a Linear retry policy 
for connection timeout failures
 Key: HADOOP-18759
 URL: https://issues.apache.org/jira/browse/HADOOP-18759
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.4
Reporter: Anuj Modi
Assignee: Anuj Modi


Today when a request fails with connection timeout, it falls back into the loop 
for exponential retry. Unlike Azure Storage, there are no guarantees of success 
on exponentially retried request or recommendations for ideal retry policies 
for Azure network or any other general failures. Faster failure and retry might 
be more beneficial for such generic connection timeout failures. 

This PR introduces a new Linear Retry Policy which will currently be used only 
for Connection Timeout failures.
Two types of Linear Backoff calculations will be supported:
 # min-backoff starts with 500 ms and with each attempted retry, back-off 
increments double, capped at 30 sec max
 # min-backoff starts with 500 ms and with each attempted retry, back-off 
increments by 1 sec, capped at 30 sec max



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18841) [ABFS][Retry Policy] Using hadoop-common code to refractor Abfs Retry Policy Implementation

2023-08-07 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-18841:
--

 Summary: [ABFS][Retry Policy] Using hadoop-common code to 
refractor Abfs Retry Policy Implementation
 Key: HADOOP-18841
 URL: https://issues.apache.org/jira/browse/HADOOP-18841
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.6, 3.3.4, 3.3.3, 3.3.5
Reporter: Anuj Modi
Assignee: Anuj Modi


AbfsRetryPolicy is an independent module in hadoop-azure code. There is a lot 
of reimplementation of a more generic and advanced retry policy already present 
in hadoop-common. AbfsRetryPolicy should either inherit or directly use 
io.retry.Retrypolicy instead of reimplementing the same functionalities again.

AbfsRetryPolicy has only two versions: exponential and static. These both along 
with many others are already present in io.retry.Retrypolicy.

Issue identified in this PR: [Hadoop-18759: [ABFS][Backoff-Optimization] Have a 
Static retry policy for connection timeout. by anujmodi2021 · Pull Request 
#5881 · apache/hadoop (github.com)|https://github.com/apache/hadoop/pull/5881]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18869) ABFS: Fixing Behavior of a File System APIs on root path

2023-11-07 Thread Anuj Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Modi resolved HADOOP-18869.

Resolution: Resolved

> ABFS: Fixing Behavior of a File System APIs on root path
> 
>
> Key: HADOOP-18869
> URL: https://issues.apache.org/jira/browse/HADOOP-18869
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.6
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Following HDFS Apis are failing when called on a root path.
> {*}{*}{*}{*}{*}{*}
> |FS Call|Status|Error thrown to caller|
> |create()|Failing|Operation failed: "The request URI is invalid.", 400, PUT, 
> https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-02076119-21ed-4ada-bcd0-14afaae54013/?resource=file=90,
>  InvalidUri, "The request URI is invalid. 
> RequestId:1d23f8c2-d01f-0059-61b6-c60c2400 
> Time:2023-08-04T09:29:55.4813818Z"|
> |createNonRecursive()|Failing|Runtime Exception: 
> java.lang.IllegalArgumentException: null path (This is occuring because 
> getParentPath is null and getFileStatus is called on null)|
> |setXAttr()|Failing|Operation failed: "The request URI is invalid.", 400, 
> HEAD, 
> https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-491399b3-c3d0-4568-9d4a-a26e0aa8f000/?upn=false=90|
> |getXAttr()|Failing|Operation failed: "The request URI is invalid.", 400, 
> HEAD, 
> https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-491399b3-c3d0-4568-9d4a-a26e0aa8f000/?upn=false=91|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18971) ABFS: Enable Footer Read Optimizations with Appropriate Footer Read Buffer Size

2023-11-13 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-18971:
--

 Summary: ABFS: Enable Footer Read Optimizations with Appropriate 
Footer Read Buffer Size
 Key: HADOOP-18971
 URL: https://issues.apache.org/jira/browse/HADOOP-18971
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.6
Reporter: Anuj Modi


Footer Read Optimization was introduced to Hadoop azure in this Jira: 
https://issues.apache.org/jira/browse/HADOOP-17347
and was kept disabled by default.
This PR is to enable footer reads by default based on the results of analysis 
performed as below:

In our scale workload analysis, it was found that workloads working with 
Parquet (or for that matter OCR etc.) have a lot of footer reads. Footer reads 
here refers to the read operations done by workload to get the metadata of the 
parquet file which is required to understand where the actual data resides in 
the parquet.
This whole process takes place in 3 steps:
 # Workload reads the last 8 bytes of parquet file to get the offset and size 
of the metadata which is present just above these 8 bytes.
 # Using that offset, workload reads the metadata to get the exact offset and 
length of data which it wants to read.
 # Workload performs the final read operation to get the data it wants to use 
for its purpose.

Here the first two steps are metadata reads that can be combined into a single 
footer read. When workload tries to read certain last few bytes of data (let's 
say this value is footer size), driver will intelligently read some extra bytes 
above the footer size to cater to the next read which is going to come.

Q. What is the footer size of file?
A: 16KB. Any read request trying to get the data within last 16KB of the file 
will qualify for whole footer read. This value is enough to cater to all types 
of files including parquet, OCR, etc.

Q. What is the buffer size to read when reading the footer?
A. Let's call this footer read buffer size. Prior to this PR footer read buffer 
size was same as read buffer size (default 4MB). It was found that for most of 
the workload required footer size was only 256KB. i.e. For almost all parquet 
files metadata for that file was found to be within last 256KBs. Keeping this 
in mind it does not make sense to read whole buffer length of 4MB as a part of 
footer read. Moreover, reading larger data than require incur additional costs 
in terms of server and network latencies. Based on this and extensive 
experimentation it was observed that footer read buffer size of 512KB is ideal 
for almost all the workloads running on parquet, OCR, etc.

Following configuration was introduced to configure the footer read buffer size:
{*}fs.azure.footer.read.request.size{*}: default 512 KB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19110) ITestExponentialRetryPolicy failing in branch-3.4

2024-04-13 Thread Anuj Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Modi resolved HADOOP-19110.

   Fix Version/s: 3.4.1
Target Version/s: 3.4.1
  Resolution: Fixed

> ITestExponentialRetryPolicy failing in branch-3.4
> -
>
> Key: HADOOP-19110
> URL: https://issues.apache.org/jira/browse/HADOOP-19110
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Assignee: Anuj Modi
>Priority: Major
> Fix For: 3.4.1
>
>
> {code:java}
> [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 91.416 s <<< FAILURE! - in 
> org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy
> [ERROR] 
> testThrottlingIntercept(org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy)
>   Time elapsed: 0.622 s  <<< ERROR!
> Failure to initialize configuration for dummy.dfs.core.windows.net key 
> ="null": Invalid configuration value detected for fs.azure.account.key
>   at 
> org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:53)
>   at 
> org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:646)
>   at 
> org.apache.hadoop.fs.azurebfs.services.ITestAbfsClient.createTestClientFromCurrentContext(ITestAbfsClient.java:339)
>   at 
> org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy.testThrottlingIntercept(ITestExponentialRetryPolicy.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19129) ABFS: Fixing Test Script Bug and Some Known test Failures in ABFS Test Suite

2024-04-13 Thread Anuj Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Modi resolved HADOOP-19129.

Fix Version/s: 3.4.1
 Hadoop Flags: Reviewed
 Release Note: https://github.com/apache/hadoop/pull/6676
   Resolution: Fixed

[HADOOP-19129: [ABFS] Test Fixes and Test Script Bug Fixes by anujmodi2021 · 
Pull Request #6676 · apache/hadoop 
(github.com)|https://github.com/apache/hadoop/pull/6676]

> ABFS: Fixing Test Script Bug and Some Known test Failures in ABFS Test Suite
> 
>
> Key: HADOOP-19129
> URL: https://issues.apache.org/jira/browse/HADOOP-19129
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> Test Script used by ABFS to validate changes has following two issues:
>  # When there are a lot of test failures or when error message of any failing 
> test becomes very large, the regex used today to filter test results does not 
> work as expected and fails to report all the failing tests.
> To resolve this, we have come up with new regex that will only target one 
> line test names for reporting them into aggregated test results.
>  # While running the test suite for different combinations of Auth type and 
> account type, we add the combination specific configs first and then include 
> the account specific configs in core-site.xml file. This will override the 
> combination specific configs like auth type if the same config is present in 
> account specific config file. To avoid this, we will first include the 
> account specific configs and then add the combination specific configs.
> Due to above bug in test script, some test failures in ABFS were not getting 
> our attention. This PR also targets to resolve them. Following are the tests 
> fixed:
>  # ITestAzureBlobFileSystemAppend.testCloseOfDataBlockOnAppendComplete(): It 
> was failing only when append blobs were enabled. In case of append blobs we 
> were not closing the active block on outputstrea,close() due to which 
> block.close() was not getting called and assertions around it were failing. 
> Fixed by updating the production code to close the active block on flush.
>  # ITestAzureBlobFileSystemAuthorization: Tests in this class works with an 
> existing remote filesystem instead of creating a new file system instance. 
> For this they require file system configured in account settings using 
> following config: "fs.contract.test.fs.abfs". Tests weref ailing with NPE 
> when this config was not present. Updated code to skip thsi test if required 
> config is not present.
>  # ITestAbfsClient.testListPathWithValueGreaterThanServerMaximum(): Test was 
> failing Intermittently only for HNS enabled accounts. Test wants to assert 
> that client.listPath() does not return more objects than what is configured 
> in maxListResults. Assertions should be that number of objects returned could 
> be less than expected as server might end up returning even lesser due to 
> partition splits along with a continuation token.
>  # ITestGetNameSpaceEnabled.testGetIsNamespaceEnabledWhenConfigIsTrue(): Fail 
> when "fs.azure.test.namespace.enabled" config is missing. Ignore the test if 
> config is missing.
>  # ITestGetNameSpaceEnabled.testGetIsNamespaceEnabledWhenConfigIsFalse(): 
> Fail when "fs.azure.test.namespace.enabled" config is missing. Ignore the 
> test if config is missing.
>  # ITestGetNameSpaceEnabled.testNonXNSAccount(): Fail when 
> "fs.azure.test.namespace.enabled" config is missing. Ignore the test if 
> config is missing.
>  # ITestAbfsStreamStatistics.testAbfsStreamOps: Fails when 
> "fs.azure.test.appendblob.enabled" is set to true. Test wanted to assert that 
> number of read operations can be more in case of append blobs as compared to 
> normal blob because of automatic flush. It could be same as that of normal 
> blob as well.
>  # ITestAzureBlobFileSystemCheckAccess.testCheckAccessForAccountWithoutNS: 
> Fails for FNS Account only when following config is present:  
> fs.azure.account.hns.enabled". Failure is because test wants to assert that 
> when driver does not know if the account is HNS enabled or not it makes a 
> server call and fails. But above config is letting driver know the account 
> type and skipping the head call. Remove these configs from the test specific 
> configurations and not from the account settings file.
>  # ITestAbfsTerasort.test_120_terasort: Fails with OAuth on HNS account. 
> Failure is because of identity mismatch. OAuth uses service principle OID as 
> owner of the resources whereas Shared Key uses local system identities. 

[jira] [Resolved] (HADOOP-19106) [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE

2024-04-13 Thread Anuj Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Modi resolved HADOOP-19106.

   Fix Version/s: 3.4.1
Hadoop Flags: Reviewed
Release Note: https://github.com/apache/hadoop/pull/6676
Target Version/s: 3.4.1
  Resolution: Fixed

[HADOOP-19129: [ABFS] Test Fixes and Test Script Bug Fixes by anujmodi2021 · 
Pull Request #6676 · apache/hadoop 
(github.com)|https://github.com/apache/hadoop/pull/6676]

> [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE
> -
>
> Key: HADOOP-19106
> URL: https://issues.apache.org/jira/browse/HADOOP-19106
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Assignee: Anuj Modi
>Priority: Major
> Fix For: 3.4.1
>
>
> When below config set to true all of the tests fails else it skips.
> 
>     fs.azure.test.namespace.enabled
>     true
> 
>  
> [*ERROR*] 
> testOpenFileAuthorized(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization)
>   Time elapsed: 0.064 s  <<< ERROR!
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.runTest(ITestAzureBlobFileSystemAuthorization.java:273)
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.testOpenFileAuthorized(ITestAzureBlobFileSystemAuthorization.java:132)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18759) [ABFS][Backoff-Optimization] Have a Static retry policy for connection timeout failures

2024-05-16 Thread Anuj Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Modi resolved HADOOP-18759.

   Fix Version/s: 3.4.1
  (was: 3.5.0)
Release Note: https://github.com/apache/hadoop/pull/5881
Target Version/s: 3.4.0  (was: 3.3.4)
  Resolution: Fixed

[Hadoop-18759: [ABFS][Backoff-Optimization] Have a Static retry policy for 
connection timeout. by anujmodi2021 · Pull Request #5881 · apache/hadoop 
(github.com)|https://github.com/apache/hadoop/pull/5881]

> [ABFS][Backoff-Optimization] Have a Static retry policy for connection 
> timeout failures
> ---
>
> Key: HADOOP-18759
> URL: https://issues.apache.org/jira/browse/HADOOP-18759
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.4
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
> Fix For: 3.4.1
>
>
> Today when a request fails with connection timeout, it falls back into the 
> loop for exponential retry. Unlike Azure Storage, there are no guarantees of 
> success on exponentially retried request or recommendations for ideal retry 
> policies for Azure network or any other general failures. Faster failure and 
> retry might be more beneficial for such generic connection timeout failures. 
> This PR introduces a new Static Retry Policy which will currently be used 
> only for Connection Timeout failures. It means all the requests failing with 
> Connection Timeout errors will be retried after a constant retry(sleep) 
> interval independent of how many times that request has failed. Max Retry 
> Count check will still be in place.
> Following Configurations will be introduced in the change:
>  # "fs.azure.static.retry.for.connection.timeout.enabled" - default: true, 
> true: static retry will be used for CT, false: Exponential retry will be used.
>  # "fs.azure.static.retry.interval" - default: 1000ms.
> This also introduces a new field in x-ms-client-request-id only for the 
> requests that are being retried after connection timeout failure. New filed 
> will tell what retry policy was used to get the sleep interval before making 
> this request.
> Header "x-ms-client-request-id " right now has only the retryCount and 
> retryReason this particular API call is. For ex:  
> :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_CT.
> Moving ahead for retryReason "CT" it will have retry policy abbreviation as 
> well.
> For ex:  
> :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_CT_E.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18011) ABFS: Enable config control for default connection timeout

2024-05-16 Thread Anuj Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Modi resolved HADOOP-18011.

Fix Version/s: 3.4.1
 Hadoop Flags: Reviewed
 Release Note: https://github.com/apache/hadoop/pull/5881
   Resolution: Fixed

PR checked in: [Hadoop-18759: [ABFS][Backoff-Optimization] Have a Static retry 
policy for connection timeout. by anujmodi2021 · Pull Request #5881 · 
apache/hadoop (github.com)|https://github.com/apache/hadoop/pull/5881]

> ABFS: Enable config control for default connection timeout 
> ---
>
> Key: HADOOP-18011
> URL: https://issues.apache.org/jira/browse/HADOOP-18011
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> ABFS driver has a default connection timeout and read timeout value of 30 
> secs. For jobs that are time sensitive, preference would be quick failure and 
> have shorter HTTP connection and read timeout. 
> This Jira is created enable config control over the default connection and 
> read timeout. 
> New config name:
> fs.azure.http.connection.timeout
> fs.azure.http.read.timeout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19129) ABFS: Fixing Test Script Bug and Some Known test Failures in ABFS Test Suite

2024-03-26 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19129:
--

 Summary: ABFS: Fixing Test Script Bug and Some Known test Failures 
in ABFS Test Suite
 Key: HADOOP-19129
 URL: https://issues.apache.org/jira/browse/HADOOP-19129
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0, 3.4.1
Reporter: Anuj Modi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19096) [ABFS] Enhancing Client-Side Throttling Metrics Updation Logic

2024-02-29 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19096:
--

 Summary: [ABFS] Enhancing Client-Side Throttling Metrics Updation 
Logic
 Key: HADOOP-19096
 URL: https://issues.apache.org/jira/browse/HADOOP-19096
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.1
Reporter: Anuj Modi
 Fix For: 3.4.1


ABFS has a client-side throttling mechanism which works on the metrics 
collected from past requests made. I requests are getting failed due to 
throttling at server, we update our metrics and client side backoff is 
calculated based on those metrics.

This PR enhances the logic to decide which requests should be considered to 
compute client side backoff interval as follows:

For each request made by ABFS driver, we will determine if they should 
contribute to Client-Side Throttling based on the status code and result:
 # Status code in 2xx range: Successful Operations should contribute.
 # Status code in 3xx range: Redirection Operations should not contribute.
 # Status code in 4xx range: User Errors should not contribute.
 # Status code is 503: Throttling Error should contribute only if they are due 
to client limits breach as follows:
 ## 503, Ingress Over Account Limit: Should Contribute
 ## 503, Egress Over Account Limit: Should Contribute
 ## 503, TPS Over Account Limit: Should Contribute
 ## 503, Other Server Throttling: Should not Contribute.
 # Status code in 5xx range other than 503: Should not Contribute.
 # IOException and UnknownHostExceptions: Should not Contribute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19089) Reverting Back Support of setXAttr() and getXAttr() on root path

2024-02-26 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19089:
--

 Summary: Reverting Back Support of setXAttr() and getXAttr() on 
root path
 Key: HADOOP-19089
 URL: https://issues.apache.org/jira/browse/HADOOP-19089
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Anuj Modi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18874) ABFS: Adding Server returned request id in Exception method thrown to caller.

2024-02-26 Thread Anuj Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Modi resolved HADOOP-18874.

Resolution: Resolved

> ABFS: Adding Server returned request id in Exception method thrown to caller.
> -
>
> Key: HADOOP-18874
> URL: https://issues.apache.org/jira/browse/HADOOP-18874
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Each request made to Azure server has its unique ActivityId (rid) which is 
> returned in response of the request whether is succeed or fails.
> When a HDFS call fails due to an error from Azure service, An 
> ABFSRestOperationException is throws to the caller. This task is to add a 
> server returned activity id (rid) in the exception message which can be used 
> to investigate the failure on service side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport

2024-02-26 Thread Anuj Modi (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Modi resolved HADOOP-18910.

Resolution: Done

> ABFS: Adding Support for MD5 Hash based integrity verification of the request 
> content during transport 
> ---
>
> Key: HADOOP-18910
> URL: https://issues.apache.org/jira/browse/HADOOP-18910
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Azure Storage Supports Content-MD5 Request Headers in Both Read and Append 
> APIs.
> Read: [Path - Read - REST API (Azure Storage Services) | Microsoft 
> Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read]
> Append: [Path - Update - REST API (Azure Storage Services) | Microsoft 
> Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update]
> This change is to make client-side changes to support them. In Read request, 
> we will send the appropriate header in response to which server will return 
> the MD5 Hash of the data it sends back. On Client we will tally this with the 
> MD5 hash computed from the data received.
> In Append request, we will compute the MD5 Hash of the data that we are 
> sending to the server and specify that in appropriate header. Server on 
> finding that header will tally this with the MD5 hash it will compute on the 
> data received. 
> This whole Checksum Validation Support is guarded behind a config, Config is 
> by default disabled because with the use of "https" integrity of data is 
> preserved anyways. This is introduced as an additional data integrity check 
> which will have a performance impact as well.
> Users can decide if they want to enable this or not by setting the following 
> config to *"true"* or *"false"* respectively. *Config: 
> "fs.azure.enable.checksum.validation"*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19187) ABFS: Making AbfsClient Abstract for supporting both DFS and Blob Endpoint

2024-05-27 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19187:
--

 Summary: ABFS: Making AbfsClient Abstract for supporting both DFS 
and Blob Endpoint
 Key: HADOOP-19187
 URL: https://issues.apache.org/jira/browse/HADOOP-19187
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Anuj Modi
Assignee: Anuj Modi
 Fix For: 3.5.0, 3.4.1


Azure Services support two different set of APIs.
Blob: 
[https://learn.microsoft.com/en-us/rest/api/storageservices/blob-service-rest-api]
 
DFS: 
[https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/operation-groups]
 

As per the plan in HADOOP-19179, this task enables ABFS Driver to work with 
both set of APIs as per the requirement.

Scope of this task is to refactor the ABfsClient so that ABFSStore can choose 
to interact with the client it wants based on the endpoint configured by user.

The blob endpoint support will remain "Unsupported" until the whole code is 
checked-in and well tested.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org