[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053890#comment-18053890
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 merged PR #8153:
URL: https://github.com/apache/hadoop/pull/8153




> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053643#comment-18053643
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

hadoop-yetus commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3785216455

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  27m 56s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  checkstyle  |   0m 34s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  trunk passed  |
   | -1 :x: |  javadoc  |   0m 40s | 
[/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/8/artifact/out/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in trunk failed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04. 
 |
   | -1 :x: |  javadoc  |   0m 33s | 
[/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/8/artifact/out/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in trunk failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  spotbugs  |   0m 57s | 
[/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/8/artifact/out/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html)
 |  hadoop-tools/hadoop-azure in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  15m 47s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  16m  4s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 21s |  |  the patch passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 14s |  |  
hadoop-tools/hadoop-azure: The patch generated 0 new + 3 unchanged - 1 fixed = 
3 total (was 4)  |
   | +1 :green_heart: |  mvnsite  |   0m 22s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 21s | 
[/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/8/artifact/out/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javadoc  |   0m 16s | 
[/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/8/artifact/out/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | +1 :green_heart: |  spotbugs  |   0m 44s |  |  hadoop-tools/hadoop-azure 
generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1)  |
   | +1 :green_heart: |  shadedclient  |  14m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 15s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 20s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  71m  7s

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-13 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18051528#comment-18051528
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

hadoop-yetus commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-374452

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   9m 17s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  21m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  checkstyle  |   0m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 26s |  |  trunk passed  |
   | -1 :x: |  javadoc  |   0m 28s | 
[/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/7/artifact/out/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in trunk failed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04. 
 |
   | -1 :x: |  javadoc  |   0m 24s | 
[/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/7/artifact/out/branch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in trunk failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  spotbugs  |   0m 44s | 
[/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/7/artifact/out/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html)
 |  hadoop-tools/hadoop-azure in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  14m 20s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  14m 34s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 23s |  |  the patch passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 23s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 12s |  |  
hadoop-tools/hadoop-azure: The patch generated 0 new + 3 unchanged - 1 fixed = 
3 total (was 4)  |
   | +1 :green_heart: |  mvnsite  |   0m 21s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 21s | 
[/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/7/artifact/out/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javadoc  |   0m 16s | 
[/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/7/artifact/out/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  spotbugs  |   0m 45s | 
[/new-spotbugs-hadoop-tools_hadoop-azure.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/7/artifact/out/new-spotbugs-hadoop-tools_hadoop-azure.html)
 |  hadoop-tools/hadoop-azure generated 5 new + 0 unchanged - 1 fixed = 5 total 
(was 1)  |
   | +1 :green_heart: |  shadedclient  |  14m 35s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 19s |  |  hado

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050801#comment-18050801
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

hadoop-yetus commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3727596292

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   9m  6s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  21m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 23s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | -1 :x: |  spotbugs  |   0m 48s | 
[/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/6/artifact/out/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html)
 |  hadoop-tools/hadoop-azure in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  14m 19s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  14m 31s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 21s |  |  the patch passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 11s |  |  
hadoop-tools/hadoop-azure: The patch generated 0 new + 3 unchanged - 1 fixed = 
3 total (was 4)  |
   | +1 :green_heart: |  mvnsite  |   0m 21s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 18s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/6/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04 with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 generated 2 new + 1571 unchanged - 14 
fixed = 1573 total (was 1585)  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  
hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04 with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 generated 0 new + 1452 unchanged - 12 
fixed = 1452 total (was 1464)  |
   | -1 :x: |  spotbugs  |   0m 44s | 
[/new-spotbugs-hadoop-tools_hadoop-azure.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/6/artifact/out/new-spotbugs-hadoop-tools_hadoop-azure.html)
 |  hadoop-tools/hadoop-azure generated 5 new + 0 unchanged - 1 fixed = 5 total 
(was 1)  |
   | +1 :green_heart: |  shadedclient  |  14m 15s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 16s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 21s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  69m 43s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-tools/hadoop-azure |
   |  |  Inconsistent synchronization of 
org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.bCursor; locked 87% of 
time  Unsynchronized access at AbfsInputStream.java:87% of time  Unsynchronized 
access at AbfsInputStream.java:[line 970] |
   |  |  Inconsistent sy

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050780#comment-18050780
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

bhattmanish98 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674996643


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java:
##
@@ -215,6 +216,12 @@ public final class ConfigurationKeys {
   public static final String FS_AZURE_READ_AHEAD_QUEUE_DEPTH = 
"fs.azure.readaheadqueue.depth";
   public static final String FS_AZURE_ALWAYS_READ_BUFFER_SIZE = 
"fs.azure.read.alwaysReadBufferSize";
   public static final String FS_AZURE_READ_AHEAD_BLOCK_SIZE = 
"fs.azure.read.readahead.blocksize";
+  /**
+   * Provides hint for the read workload pattern.
+   * Possible Values Exposed in {@link Options.OpenFileOptions}

Review Comment:
   You can do something like this:
   import org.apache.hadoop.fs.Options.OpenFileOptions;
   
   * Possible Values Exposed in {@link 
OpenFileOptions#FS_OPTION_OPENFILE_READ_POLICIES}
   





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050779#comment-18050779
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674989212


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java:
##
@@ -946,12 +950,30 @@ public AbfsInputStream openFileForRead(Path path,
 
   perfInfo.registerSuccess(true);
 
-  // Add statistics for InputStream
-  return new AbfsInputStream(getClient(), statistics, relativePath,
-  contentLength, populateAbfsInputStreamContext(
-  parameters.map(OpenFileParameters::getOptions),
-  contextEncryptionAdapter),
-  eTag, tracingContext);
+  AbfsReadPolicy inputPolicy = 
AbfsReadPolicy.getAbfsReadPolicy(getAbfsConfiguration().getAbfsReadPolicy());

Review Comment:
   Taken





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050777#comment-18050777
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674980368


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsReadPolicy.java:
##
@@ -0,0 +1,78 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.util.Locale;
+
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_COLUMNAR;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ORC;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_PARQUET;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_RANDOM;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_WHOLE_FILE;
+
+/**
+ * Enum for ABFS Input Policies.
+ * Each policy maps to a particular implementation of {@link AbfsInputStream}
+ */
+public enum AbfsReadPolicy {
+
+  SEQUENTIAL(FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL),
+  RANDOM(FS_OPTION_OPENFILE_READ_POLICY_RANDOM),
+  ADAPTIVE(FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE);
+
+  private final String readPolicy;
+
+  AbfsReadPolicy(String readPolicy) {
+this.readPolicy = readPolicy;
+  }
+
+  @Override
+  public String toString() {
+return readPolicy;
+  }
+
+  /**
+   * Get the enum constant from the string name.
+   * @param name policy name as configured by user
+   * @return the corresponding AbsInputPolicy to be used
+   */
+  public static AbfsReadPolicy getAbfsReadPolicy(String name) {
+String trimmed = name.trim().toLowerCase(Locale.ENGLISH);

Review Comment:
   Taken





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050775#comment-18050775
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674976123


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsAdaptiveInputStream.java:
##
@@ -0,0 +1,117 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+/**
+ * Input stream implementation optimized for adaptive read patterns.
+ * This is the default implementation used for cases where user does not 
specify any input policy.
+ * It switches between sequential and random read optimizations based on the 
detected read pattern.
+ * It also keeps footer read and small file optimizations enabled.
+ */
+public class AbfsAdaptiveInputStream extends AbfsInputStream {
+
+  /**
+   * Constructs AbfsAdaptiveInputStream
+   * @param client AbfsClient to be used for read operations
+   * @param statistics to recordinput stream statistics
+   * @param path file path
+   * @param contentLength file content length
+   * @param abfsInputStreamContext input stream context
+   * @param eTag file eTag
+   * @param tracingContext tracing context to trace the read operations
+   */
+  public AbfsAdaptiveInputStream(
+  final AbfsClient client,
+  final FileSystem.Statistics statistics,
+  final String path,
+  final long contentLength,
+  final AbfsInputStreamContext abfsInputStreamContext,
+  final String eTag,
+  TracingContext tracingContext) {
+super(client, statistics, path, contentLength,
+abfsInputStreamContext, eTag, tracingContext);
+  }
+
+  /**
+   * {@inheritDoc}
+   */
+  @Override
+  protected int readOneBlock(final byte[] b, final int off, final int len) 
throws IOException {
+if (len == 0) {
+  return 0;
+}
+if (!validate(b, off, len)) {
+  return -1;
+}
+//If buffer is empty, then fill the buffer.
+if (getBCursor() == getLimit()) {
+  //If EOF, then return -1
+  if (getFCursor() >= getContentLength()) {
+return -1;
+  }
+
+  long bytesRead = 0;
+  //reset buffer to initial state - i.e., throw away existing data

Review Comment:
   Taken





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050776#comment-18050776
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674977536


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##
@@ -801,6 +802,10 @@ byte[] getBuffer() {
 return buffer;
   }
 
+  protected void setBuffer(byte[] buffer) {

Review Comment:
   Taken





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050774#comment-18050774
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674975132


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java:
##
@@ -215,6 +216,12 @@ public final class ConfigurationKeys {
   public static final String FS_AZURE_READ_AHEAD_QUEUE_DEPTH = 
"fs.azure.readaheadqueue.depth";
   public static final String FS_AZURE_ALWAYS_READ_BUFFER_SIZE = 
"fs.azure.read.alwaysReadBufferSize";
   public static final String FS_AZURE_READ_AHEAD_BLOCK_SIZE = 
"fs.azure.read.readahead.blocksize";
+  /**
+   * Provides hint for the read workload pattern.
+   * Possible Values Exposed in {@link Options.OpenFileOptions}

Review Comment:
   Tried but that variable is not resolvable here



##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsAdaptiveInputStream.java:
##
@@ -0,0 +1,117 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+/**
+ * Input stream implementation optimized for adaptive read patterns.
+ * This is the default implementation used for cases where user does not 
specify any input policy.
+ * It switches between sequential and random read optimizations based on the 
detected read pattern.
+ * It also keeps footer read and small file optimizations enabled.
+ */
+public class AbfsAdaptiveInputStream extends AbfsInputStream {
+
+  /**
+   * Constructs AbfsAdaptiveInputStream
+   * @param client AbfsClient to be used for read operations
+   * @param statistics to recordinput stream statistics
+   * @param path file path
+   * @param contentLength file content length
+   * @param abfsInputStreamContext input stream context
+   * @param eTag file eTag
+   * @param tracingContext tracing context to trace the read operations
+   */
+  public AbfsAdaptiveInputStream(
+  final AbfsClient client,
+  final FileSystem.Statistics statistics,
+  final String path,
+  final long contentLength,
+  final AbfsInputStreamContext abfsInputStreamContext,
+  final String eTag,
+  TracingContext tracingContext) {
+super(client, statistics, path, contentLength,
+abfsInputStreamContext, eTag, tracingContext);
+  }
+
+  /**
+   * {@inheritDoc}
+   */
+  @Override
+  protected int readOneBlock(final byte[] b, final int off, final int len) 
throws IOException {
+if (len == 0) {
+  return 0;
+}
+if (!validate(b, off, len)) {
+  return -1;
+}
+//If buffer is empty, then fill the buffer.
+if (getBCursor() == getLimit()) {
+  //If EOF, then return -1

Review Comment:
   Taken





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050773#comment-18050773
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

bhattmanish98 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671885515


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java:
##
@@ -946,12 +950,30 @@ public AbfsInputStream openFileForRead(Path path,
 
   perfInfo.registerSuccess(true);
 
-  // Add statistics for InputStream
-  return new AbfsInputStream(getClient(), statistics, relativePath,
-  contentLength, populateAbfsInputStreamContext(
-  parameters.map(OpenFileParameters::getOptions),
-  contextEncryptionAdapter),
-  eTag, tracingContext);
+  AbfsReadPolicy inputPolicy = 
AbfsReadPolicy.getAbfsReadPolicy(getAbfsConfiguration().getAbfsReadPolicy());

Review Comment:
   This method contains lot of lines already. Instead of defining this switch 
case here, it would be better to define a new method so that tomorrow if new 
read pattern is introduced, we can just update that method.





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050772#comment-18050772
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674967291


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java:
##
@@ -21,6 +21,7 @@
 import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.classification.InterfaceStability;
 import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Options;

Review Comment:
   Taken





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050771#comment-18050771
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674966184


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsAdaptiveInputStream.java:
##
@@ -0,0 +1,109 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+import static java.lang.Math.max;
+
+/**
+ * Input stream implementation optimized for adaptive read patterns.
+ * This is the default implementation used for cases where user does not 
specify any input policy.
+ * It switches between sequential and random read optimizations based on the 
detected read pattern.
+ * It also keeps footer read and small file optimizations enabled.
+ */
+public class AbfsAdaptiveInputStream extends AbfsInputStream {
+
+  public AbfsAdaptiveInputStream(
+  final AbfsClient client,
+  final FileSystem.Statistics statistics,
+  final String path,
+  final long contentLength,
+  final AbfsInputStreamContext abfsInputStreamContext,
+  final String eTag,
+  TracingContext tracingContext) {
+super(client, statistics, path, contentLength,
+abfsInputStreamContext, eTag, tracingContext);
+  }
+
+  /**
+   * {@inheritDoc}
+   */
+  @Override
+  protected int readOneBlock(final byte[] b, final int off, final int len) 
throws IOException {
+if (len == 0) {
+  return 0;
+}
+if (!validate(b, off, len)) {
+  return -1;
+}
+//If buffer is empty, then fill the buffer.

Review Comment:
   Taken





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050768#comment-18050768
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674962747


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV2.java:
##
@@ -980,7 +986,7 @@ public void testResetReadBufferManager() {
   getReadAheadQueue().clear();
   getInProgressList().clear();
   getCompletedReadList().clear();
-  getFreeList().clear();
+  this.freeList.clear();

Review Comment:
   Taken





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050767#comment-18050767
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674961229


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV2.java:
##
@@ -100,7 +100,8 @@ public final class ReadBufferManagerV2 extends 
ReadBufferManager {
 
   private byte[][] bufferPool;
 
-  private final Stack removedBufferList = new Stack<>();
+  private final ConcurrentSkipListSet removedBufferList = new 
ConcurrentSkipListSet<>();
+  private ConcurrentSkipListSet freeList = new 
ConcurrentSkipListSet<>();

Review Comment:
   Added





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050766#comment-18050766
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674959976


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##
@@ -334,7 +350,7 @@ private boolean shouldReadLastBlock() {
* @return number of bytes read
* @throws IOException if there is an error
*/
-  protected abstract int readOneBlock(final byte[] b, final int off, final int 
len) throws IOException;
+  protected abstract int readOneBlock(byte[] b, int off, int len) throws 
IOException;

Review Comment:
   This is abstract method's definition. Real implementatio still has final.
   Checkstyle reported final as redundant modifier here.





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050765#comment-18050765
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674958261


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##
@@ -888,48 +953,131 @@ public String toString() {
 return sb.toString();
   }
 
+  /**
+   * Getter for bCursor.
+   * @return the bCursor
+   */
   @VisibleForTesting
   int getBCursor() {
 return this.bCursor;
   }
 
+  /**
+   * Setter for bCursor.
+   * @param bCursor the bCursor to set
+   */
+  protected void setBCursor(int bCursor) {
+this.bCursor = bCursor;
+  }
+
+  /**
+   * Getter for fCursor.
+   * @return the fCursor
+   */
   @VisibleForTesting
   long getFCursor() {
 return this.fCursor;
   }
 
+  /**
+   * Setter for fCursor.
+   * @param fCursor the fCursor to set
+   */
+  protected void setFCursor(long fCursor) {
+this.fCursor = fCursor;
+  }
+
+  /**
+   * Getter for fCursorAfterLastRead.
+   * @return the fCursorAfterLastRead
+   */
   @VisibleForTesting
   long getFCursorAfterLastRead() {
 return this.fCursorAfterLastRead;
   }
 
+  /**
+   * Setter for fCursorAfterLastRead.
+   * @param fCursorAfterLastRead the fCursorAfterLastRead to set
+   */
+  protected void setFCursorAfterLastRead(long fCursorAfterLastRead) {
+this.fCursorAfterLastRead = fCursorAfterLastRead;
+  }
+
+  /**
+   * Getter for limit.
+   * @return the limit
+   */
   @VisibleForTesting
-  long getLimit() {
+  int getLimit() {

Review Comment:
   limit is already int. The caller would have needed unncessary handling.





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050764#comment-18050764
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674956339


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java:
##
@@ -416,7 +418,7 @@ public final class FileSystemConfigurations {
 
   public static final boolean DEFAULT_FS_AZURE_ENABLE_CREATE_BLOB_IDEMPOTENCY 
= true;
 
-  public static final boolean 
DEFAULT_FS_AZURE_ENABLE_PREFETCH_REQUEST_PRIORITY = true;
+  public static final boolean 
DEFAULT_FS_AZURE_ENABLE_PREFETCH_REQUEST_PRIORITY = false;

Review Comment:
   Thanks for catching. Will revert





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050763#comment-18050763
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674955845


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##
@@ -91,20 +92,21 @@ public abstract class AbfsInputStream extends FSInputStream 
implements CanUnbuff
*/
   private final boolean bufferedPreadDisabled;
   // User configured size of read ahead.
-  protected final int readAheadRange;
+  private final int readAheadRange;
+
+  private boolean firstRead = true;

Review Comment:
   Added





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050762#comment-18050762
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2674954172


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsAdaptiveInputStream.java:
##
@@ -0,0 +1,117 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+/**
+ * Input stream implementation optimized for adaptive read patterns.
+ * This is the default implementation used for cases where user does not 
specify any input policy.
+ * It switches between sequential and random read optimizations based on the 
detected read pattern.
+ * It also keeps footer read and small file optimizations enabled.
+ */
+public class AbfsAdaptiveInputStream extends AbfsInputStream {
+
+  /**
+   * Constructs AbfsAdaptiveInputStream
+   * @param client AbfsClient to be used for read operations
+   * @param statistics to recordinput stream statistics

Review Comment:
   Taken



##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRandomInputStream.java:
##
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+/**
+ * Input stream implementation optimized for random read patterns.
+ * This implementation disables prefetching of data blocks instead only
+ * reads ahead for a small range beyond what is requested by the caller.
+ */
+public class AbfsRandomInputStream extends AbfsInputStream {
+
+  /**
+   * Constructs AbfsRandomInputStream
+   * @param client AbfsClient to be used for read operations
+   * @param statistics to record input stream statistics
+   * @param path file path
+   * @param contentLength file content length
+   * @param abfsInputStreamContext input stream context
+   * @param eTag file eTag
+   * @param tracingContext tracing context to trace the read operations
+   */
+  public AbfsRandomInputStream(
+  final AbfsClient client,
+  final FileSystem.Statistics statistics,
+  final String path,
+  final long contentLength,
+  final AbfsInputStreamContext abfsInputStreamContext,
+  final String eTag,
+  TracingContext tracingContext) {
+super(client, statistics, path, contentLength,
+abfsInputStreamContext, eTag, tracingContext);
+  }
+
+  /**
+   * inheritDoc

Review Comment:
   Taken



##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -937,6 +952,92 @@ public void 
testPrefetchReadAddsPriorityHeaderWithDifferentConfigs()
 executePrefetchReadTest(tracingContext1, configuration1, false);
   }
 
+  /**
+   * Test to verify that the correct AbfsInputStream instance is created
+   * based on the read policy set i

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050609#comment-18050609
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3723523348

   > @anujmodi2021 I am trying to propose a single optimised implementation of 
an input stream across cloud implementations, as I think we all need this kind 
of logic. Ideally I want to get to a place where 80% of the logic is shared in 
a common layer, and then we only implement cloud specific clients to actually 
make the requests separately.
   > 
   > There is some consensus to move the shared logic into the parquet-java 
repo: https://lists.apache.org/thread/nbksq32cs8h1ldj8762y6wh9zzp8gqx6 , and 
some buy-in from the team at google. I'll be following up on this in the new 
year.
   > 
   > Would be great to get your thoughts and if your team would also like to 
collaborate on this.
   
   Thanks for heads up @ahmarsuhail 
   This sounds like a good plan to me as well. We will surely keep a close eye 
on the updates on this thread and try to contribute to make things better in 
best way possible.
   
   With this change we are not chaning how ABFS handles parquet file though. 
This just improves the infra and add capability for future improvements to be 
plugged in seemlessly. We will surely help address any gaps in ABFS to make 
things better for the common ground you are gearing up to improve.
   
   




> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050597#comment-18050597
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671871133


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV2.java:
##
@@ -100,7 +100,8 @@ public final class ReadBufferManagerV2 extends 
ReadBufferManager {
 
   private byte[][] bufferPool;
 
-  private final Stack removedBufferList = new Stack<>();
+  private final ConcurrentSkipListSet removedBufferList = new 
ConcurrentSkipListSet<>();
+  private ConcurrentSkipListSet freeList = new 
ConcurrentSkipListSet<>();

Review Comment:
   Add comments for this change





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050595#comment-18050595
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

bhattmanish98 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671828821


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java:
##
@@ -21,6 +21,7 @@
 import org.apache.hadoop.classification.InterfaceAudience;
 import org.apache.hadoop.classification.InterfaceStability;
 import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Options;

Review Comment:
   instead of importing entire Options class, we can just import 
OpenFileOptions class and directly mention OpenFileOptions class below in 
comments.
   import org.apache.hadoop.fs.Options.OpenFileOptions;
   



##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java:
##
@@ -946,12 +950,30 @@ public AbfsInputStream openFileForRead(Path path,
 
   perfInfo.registerSuccess(true);
 
-  // Add statistics for InputStream
-  return new AbfsInputStream(getClient(), statistics, relativePath,
-  contentLength, populateAbfsInputStreamContext(
-  parameters.map(OpenFileParameters::getOptions),
-  contextEncryptionAdapter),
-  eTag, tracingContext);
+  AbfsReadPolicy inputPolicy = 
AbfsReadPolicy.getAbfsReadPolicy(getAbfsConfiguration().getAbfsReadPolicy());

Review Comment:
   This method contains lot of lines already. Instead of defining this if else 
case here, it would be better to define a new method so that tomorrow if new 
read pattern is introduced, we can just update that method.



##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsReadPolicy.java:
##
@@ -0,0 +1,78 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.util.Locale;
+
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_COLUMNAR;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ORC;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_PARQUET;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_RANDOM;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_WHOLE_FILE;
+
+/**
+ * Enum for ABFS Input Policies.
+ * Each policy maps to a particular implementation of {@link AbfsInputStream}
+ */
+public enum AbfsReadPolicy {
+
+  SEQUENTIAL(FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL),
+  RANDOM(FS_OPTION_OPENFILE_READ_POLICY_RANDOM),
+  ADAPTIVE(FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE);
+
+  private final String readPolicy;
+
+  AbfsReadPolicy(String readPolicy) {
+this.readPolicy = readPolicy;
+  }
+
+  @Override
+  public String toString() {
+return readPolicy;
+  }
+
+  /**
+   * Get the enum constant from the string name.
+   * @param name policy name as configured by user
+   * @return the corresponding AbsInputPolicy to be used
+   */
+  public static AbfsReadPolicy getAbfsReadPolicy(String name) {
+String trimmed = name.trim().toLowerCase(Locale.ENGLISH);

Review Comment:
   This variable can be renamed to something else. 
   Like readPolicyStr



##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsAdaptiveInputStream.java:
##
@@ -0,0 +1,109 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not 

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050593#comment-18050593
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671873434


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV2.java:
##
@@ -980,7 +986,7 @@ public void testResetReadBufferManager() {
   getReadAheadQueue().clear();
   getInProgressList().clear();
   getCompletedReadList().clear();
-  getFreeList().clear();
+  this.freeList.clear();

Review Comment:
   clearFreeList() method can be called here





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050592#comment-18050592
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671871133


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/ReadBufferManagerV2.java:
##
@@ -100,7 +100,8 @@ public final class ReadBufferManagerV2 extends 
ReadBufferManager {
 
   private byte[][] bufferPool;
 
-  private final Stack removedBufferList = new Stack<>();
+  private final ConcurrentSkipListSet removedBufferList = new 
ConcurrentSkipListSet<>();
+  private ConcurrentSkipListSet freeList = new 
ConcurrentSkipListSet<>();

Review Comment:
   Can you please explain the need to move from Stack to ConcurrentSkipListSet 
specifically for V2 ?





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050590#comment-18050590
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671831182


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##
@@ -334,7 +350,7 @@ private boolean shouldReadLastBlock() {
* @return number of bytes read
* @throws IOException if there is an error
*/
-  protected abstract int readOneBlock(final byte[] b, final int off, final int 
len) throws IOException;
+  protected abstract int readOneBlock(byte[] b, int off, int len) throws 
IOException;

Review Comment:
   shouldn't these parameters be still kept final ?





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050586#comment-18050586
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671824431


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##
@@ -888,48 +953,131 @@ public String toString() {
 return sb.toString();
   }
 
+  /**
+   * Getter for bCursor.
+   * @return the bCursor
+   */
   @VisibleForTesting
   int getBCursor() {
 return this.bCursor;
   }
 
+  /**
+   * Setter for bCursor.
+   * @param bCursor the bCursor to set
+   */
+  protected void setBCursor(int bCursor) {
+this.bCursor = bCursor;
+  }
+
+  /**
+   * Getter for fCursor.
+   * @return the fCursor
+   */
   @VisibleForTesting
   long getFCursor() {
 return this.fCursor;
   }
 
+  /**
+   * Setter for fCursor.
+   * @param fCursor the fCursor to set
+   */
+  protected void setFCursor(long fCursor) {
+this.fCursor = fCursor;
+  }
+
+  /**
+   * Getter for fCursorAfterLastRead.
+   * @return the fCursorAfterLastRead
+   */
   @VisibleForTesting
   long getFCursorAfterLastRead() {
 return this.fCursorAfterLastRead;
   }
 
+  /**
+   * Setter for fCursorAfterLastRead.
+   * @param fCursorAfterLastRead the fCursorAfterLastRead to set
+   */
+  protected void setFCursorAfterLastRead(long fCursorAfterLastRead) {
+this.fCursorAfterLastRead = fCursorAfterLastRead;
+  }
+
+  /**
+   * Getter for limit.
+   * @return the limit
+   */
   @VisibleForTesting
-  long getLimit() {
+  int getLimit() {

Review Comment:
   why changed from long to int ?





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050581#comment-18050581
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

manika137 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671805850


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java:
##
@@ -416,7 +418,7 @@ public final class FileSystemConfigurations {
 
   public static final boolean DEFAULT_FS_AZURE_ENABLE_CREATE_BLOB_IDEMPOTENCY 
= true;
 
-  public static final boolean 
DEFAULT_FS_AZURE_ENABLE_PREFETCH_REQUEST_PRIORITY = true;
+  public static final boolean 
DEFAULT_FS_AZURE_ENABLE_PREFETCH_REQUEST_PRIORITY = false;

Review Comment:
   we dont require to disable it I think. We only add the request header if the 
read type is set to prefetch





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050580#comment-18050580
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671796018


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##
@@ -91,20 +92,21 @@ public abstract class AbfsInputStream extends FSInputStream 
implements CanUnbuff
*/
   private final boolean bufferedPreadDisabled;
   // User configured size of read ahead.
-  protected final int readAheadRange;
+  private final int readAheadRange;
+
+  private boolean firstRead = true;

Review Comment:
   comment as other params can be added





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050572#comment-18050572
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

manika137 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671707670


##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -937,6 +952,92 @@ public void 
testPrefetchReadAddsPriorityHeaderWithDifferentConfigs()
 executePrefetchReadTest(tracingContext1, configuration1, false);
   }
 
+  /**
+   * Test to verify that the correct AbfsInputStream instance is created
+   * based on the read policy set in AbfsConfiguration.
+   */
+  @Test
+  public void testAbfsInputStreamInstance() throws Exception {
+AzureBlobFileSystem fs = getFileSystem();
+Path path = new Path("/testPath");
+fs.create(path).close();
+
+// Assert that Sequential Read Policy uses Prefetch Input Stream
+
getAbfsStore(fs).getAbfsConfiguration().setAbfsReadPolicy(FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL);
+InputStream stream = fs.open(path).getWrappedStream();
+assertThat(stream).isInstanceOf(AbfsPrefetchInputStream.class);
+stream.close();
+
+// Assert that Adaptive Read Policy uses Adaptive Input Stream
+
getAbfsStore(fs).getAbfsConfiguration().setAbfsReadPolicy(FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE);
+stream = fs.open(path).getWrappedStream();
+assertThat(stream).isInstanceOf(AbfsAdaptiveInputStream.class);
+stream.close();
+
+// Assert that Parquet Read Policy uses Random Input Stream
+
getAbfsStore(fs).getAbfsConfiguration().setAbfsReadPolicy(FS_OPTION_OPENFILE_READ_POLICY_PARQUET);
+stream = fs.open(path).getWrappedStream();
+assertThat(stream).isInstanceOf(AbfsRandomInputStream.class);
+stream.close();
+
+// Assert that Avro Read Policy uses Adaptive Input Stream
+
getAbfsStore(fs).getAbfsConfiguration().setAbfsReadPolicy(FS_OPTION_OPENFILE_READ_POLICY_AVRO);
+stream = fs.open(path).getWrappedStream();
+assertThat(stream).isInstanceOf(AbfsAdaptiveInputStream.class);
+stream.close();
+  }
+
+  @Test
+  public void testRandomInputStreamDoesNotQueuePrefetches() throws Exception {

Review Comment:
   nit: javadoc





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050561#comment-18050561
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

manika137 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671664191


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRandomInputStream.java:
##
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+/**
+ * Input stream implementation optimized for random read patterns.
+ * This implementation disables prefetching of data blocks instead only
+ * reads ahead for a small range beyond what is requested by the caller.
+ */
+public class AbfsRandomInputStream extends AbfsInputStream {
+
+  /**
+   * Constructs AbfsRandomInputStream
+   * @param client AbfsClient to be used for read operations
+   * @param statistics to record input stream statistics
+   * @param path file path
+   * @param contentLength file content length
+   * @param abfsInputStreamContext input stream context
+   * @param eTag file eTag
+   * @param tracingContext tracing context to trace the read operations
+   */
+  public AbfsRandomInputStream(
+  final AbfsClient client,
+  final FileSystem.Statistics statistics,
+  final String path,
+  final long contentLength,
+  final AbfsInputStreamContext abfsInputStreamContext,
+  final String eTag,
+  TracingContext tracingContext) {
+super(client, statistics, path, contentLength,
+abfsInputStreamContext, eTag, tracingContext);
+  }
+
+  /**
+   * inheritDoc

Review Comment:
   nit "{@inheritDoc}"





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050562#comment-18050562
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

manika137 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671664191


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRandomInputStream.java:
##
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+/**
+ * Input stream implementation optimized for random read patterns.
+ * This implementation disables prefetching of data blocks instead only
+ * reads ahead for a small range beyond what is requested by the caller.
+ */
+public class AbfsRandomInputStream extends AbfsInputStream {
+
+  /**
+   * Constructs AbfsRandomInputStream
+   * @param client AbfsClient to be used for read operations
+   * @param statistics to record input stream statistics
+   * @param path file path
+   * @param contentLength file content length
+   * @param abfsInputStreamContext input stream context
+   * @param eTag file eTag
+   * @param tracingContext tracing context to trace the read operations
+   */
+  public AbfsRandomInputStream(
+  final AbfsClient client,
+  final FileSystem.Statistics statistics,
+  final String path,
+  final long contentLength,
+  final AbfsInputStreamContext abfsInputStreamContext,
+  final String eTag,
+  TracingContext tracingContext) {
+super(client, statistics, path, contentLength,
+abfsInputStreamContext, eTag, tracingContext);
+  }
+
+  /**
+   * inheritDoc

Review Comment:
   nit {@ inheritDoc}





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050560#comment-18050560
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

manika137 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671664191


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRandomInputStream.java:
##
@@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+/**
+ * Input stream implementation optimized for random read patterns.
+ * This implementation disables prefetching of data blocks instead only
+ * reads ahead for a small range beyond what is requested by the caller.
+ */
+public class AbfsRandomInputStream extends AbfsInputStream {
+
+  /**
+   * Constructs AbfsRandomInputStream
+   * @param client AbfsClient to be used for read operations
+   * @param statistics to record input stream statistics
+   * @param path file path
+   * @param contentLength file content length
+   * @param abfsInputStreamContext input stream context
+   * @param eTag file eTag
+   * @param tracingContext tracing context to trace the read operations
+   */
+  public AbfsRandomInputStream(
+  final AbfsClient client,
+  final FileSystem.Statistics statistics,
+  final String path,
+  final long contentLength,
+  final AbfsInputStreamContext abfsInputStreamContext,
+  final String eTag,
+  TracingContext tracingContext) {
+super(client, statistics, path, contentLength,
+abfsInputStreamContext, eTag, tracingContext);
+  }
+
+  /**
+   * inheritDoc

Review Comment:
   nit: {@inheritDoc}





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050557#comment-18050557
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

manika137 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2671637289


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsAdaptiveInputStream.java:
##
@@ -0,0 +1,117 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+/**
+ * Input stream implementation optimized for adaptive read patterns.
+ * This is the default implementation used for cases where user does not 
specify any input policy.
+ * It switches between sequential and random read optimizations based on the 
detected read pattern.
+ * It also keeps footer read and small file optimizations enabled.
+ */
+public class AbfsAdaptiveInputStream extends AbfsInputStream {
+
+  /**
+   * Constructs AbfsAdaptiveInputStream
+   * @param client AbfsClient to be used for read operations
+   * @param statistics to recordinput stream statistics

Review Comment:
   nit: space





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050311#comment-18050311
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

hadoop-yetus commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3717726028

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  22m 15s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | -1 :x: |  spotbugs  |   0m 45s | 
[/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/5/artifact/out/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html)
 |  hadoop-tools/hadoop-azure in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  14m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  14m 34s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 21s |  |  the patch passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 10s |  |  
hadoop-tools/hadoop-azure: The patch generated 0 new + 3 unchanged - 1 fixed = 
3 total (was 4)  |
   | +1 :green_heart: |  mvnsite  |   0m 21s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 19s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/5/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04 with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 generated 5 new + 1575 unchanged - 10 
fixed = 1580 total (was 1585)  |
   | -1 :x: |  javadoc  |   0m 17s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/5/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04 with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 generated 2 new + 1457 unchanged - 7 
fixed = 1459 total (was 1464)  |
   | -1 :x: |  spotbugs  |   0m 45s | 
[/new-spotbugs-hadoop-tools_hadoop-azure.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/5/artifact/out/new-spotbugs-hadoop-tools_hadoop-azure.html)
 |  hadoop-tools/hadoop-azure generated 5 new + 1 unchanged - 0 fixed = 6 total 
(was 1)  |
   | +1 :green_heart: |  shadedclient  |  14m 10s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 11s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 19s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  61m 44s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-tools/hadoop-azure |
   |  | 

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050299#comment-18050299
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2667338395


##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -1027,11 +1129,14 @@ private void 
assertReadTypeInClientRequestId(AzureBlobFileSystem fs, int numOfRe
 ArgumentCaptor captor8 = 
ArgumentCaptor.forClass(ContextEncryptionAdapter.class);
 ArgumentCaptor captor9 = 
ArgumentCaptor.forClass(TracingContext.class);
 
+List paths = captor1.getAllValues();
+System.out.println(paths);

Review Comment:
   Thanks for catching. Taken





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050298#comment-18050298
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2667337729


##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -57,17 +58,16 @@
 import org.apache.hadoop.fs.azurebfs.utils.TracingHeaderVersion;
 import org.apache.hadoop.fs.impl.OpenFileParameters;
 
+import static org.apache.hadoop.fs.Options.OpenFileOptions.*;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_AVRO;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL;
 import static org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.COLON;
 import static 
org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.EMPTY_STRING;
 import static 
org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.SPLIT_NO_LIMIT;
 import static 
org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.FS_AZURE_ENABLE_PREFETCH_REQUEST_PRIORITY;
+import static 
org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ONE_MB;
 import static 
org.apache.hadoop.fs.azurebfs.constants.HttpHeaderConfigurations.X_MS_REQUEST_PRIORITY;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.DIRECT_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.FOOTER_READ;
-import static 
org.apache.hadoop.fs.azurebfs.constants.ReadType.MISSEDCACHE_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.NORMAL_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.PREFETCH_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.SMALLFILE_READ;
+import static org.apache.hadoop.fs.azurebfs.constants.ReadType.*;

Review Comment:
   Takne



##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -57,17 +58,16 @@
 import org.apache.hadoop.fs.azurebfs.utils.TracingHeaderVersion;
 import org.apache.hadoop.fs.impl.OpenFileParameters;
 
+import static org.apache.hadoop.fs.Options.OpenFileOptions.*;

Review Comment:
   Taken



##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -881,6 +881,15 @@ public void testReadTypeInTracingContextHeader() throws 
Exception {
 doReturn(false).when(spiedConfig).optimizeFooterRead();
 testReadTypeInTracingContextHeaderInternal(spiedFs, fileSize, 
SMALLFILE_READ, 1, totalReadCalls);
 
+/*
+ * Test to verify Random Read Type.
+ * Settin Read Policy to Parquet ensures Random Read Type.

Review Comment:
   Taken





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050297#comment-18050297
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2667337177


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputPolicy.java:
##
@@ -0,0 +1,78 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.util.Locale;
+
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_COLUMNAR;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ORC;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_PARQUET;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_RANDOM;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_WHOLE_FILE;
+
+/**
+ * Enum for ABFS Input Policies.
+ * Each policy maps to a particular implementation of {@link AbfsInputStream}
+ */
+public enum AbfsInputPolicy {

Review Comment:
   Changed to ReadPolicy.
   Anything is fine IMO



##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputPolicy.java:
##
@@ -0,0 +1,78 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.util.Locale;
+
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_COLUMNAR;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ORC;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_PARQUET;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_RANDOM;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_WHOLE_FILE;
+
+/**
+ * Enum for ABFS Input Policies.
+ * Each policy maps to a particular implementation of {@link AbfsInputStream}
+ */
+public enum AbfsInputPolicy {
+
+  SEQUENTIAL(FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL),
+  RANDOM(FS_OPTION_OPENFILE_READ_POLICY_RANDOM),
+  ADAPTIVE(FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE);
+
+  private final String policy;

Review Comment:
   Taken



##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRandomInputStream.java:
##
@@ -0,0 +1,91 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file exce

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050296#comment-18050296
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2667336661


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsAdaptiveInputStream.java:
##
@@ -0,0 +1,109 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+import static java.lang.Math.max;
+
+/**
+ * Input stream implementation optimized for adaptive read patterns.
+ * This is the default implementation used for cases where user does not 
specify any input policy.
+ * It switches between sequential and random read optimizations based on the 
detected read pattern.
+ * It also keeps footer read and small file optimizations enabled.
+ */
+public class AbfsAdaptiveInputStream extends AbfsInputStream {
+
+  public AbfsAdaptiveInputStream(

Review Comment:
   Added





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050223#comment-18050223
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

hadoop-yetus commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3716127736

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 4 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  23m  5s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | -1 :x: |  spotbugs  |   0m 43s | 
[/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/4/artifact/out/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html)
 |  hadoop-tools/hadoop-azure in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  13m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | -0 :warning: |  patch  |  14m  6s |  |  Used diff version of patch file. 
Binary files and potentially other changes not applied. Please rebase and 
squash commits if necessary.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 20s |  |  the patch passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 11s | 
[/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/4/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt)
 |  hadoop-tools/hadoop-azure: The patch generated 19 new + 3 unchanged - 1 
fixed = 22 total (was 4)  |
   | +1 :green_heart: |  mvnsite  |   0m 24s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 20s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/4/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04 with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 generated 17 new + 1585 unchanged - 0 
fixed = 1602 total (was 1585)  |
   | -1 :x: |  javadoc  |   0m 16s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/4/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04 with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 generated 17 new + 1464 unchanged - 0 
fixed = 1481 total (was 1464)  |
   | -1 :x: |  spotbugs  |   0m 45s | 
[/new-spotbugs-hadoop-tools_hadoop-azure.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/4/artifact/out/new-spotbugs-hadoop-tools_hadoop-azure.html)
 |  hadoop-tools/hadoop-azure generated 1 new + 1 unchanged - 0 fixed = 2 total 
(was 1)  |
   | +1 :green_heart: |  shadedclient  |  14m 10s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 11s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 23s |  |  The patch does

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050210#comment-18050210
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3715939825

   --
    AGGREGATED TEST RESULT 
   
   
   HNS-OAuth-DFS
   
   [ERROR] 
org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemListStatus.testListPath 

> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050111#comment-18050111
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

hadoop-yetus commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3714568975

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  21m 56s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 23s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | -1 :x: |  spotbugs  |   0m 45s | 
[/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/3/artifact/out/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html)
 |  hadoop-tools/hadoop-azure in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  14m 26s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 15s | 
[/patch-mvninstall-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/3/artifact/out/patch-mvninstall-hadoop-tools_hadoop-azure.txt)
 |  hadoop-azure in the patch failed.  |
   | -1 :x: |  compile  |   0m 14s | 
[/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/3/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javac  |   0m 14s | 
[/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/3/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  compile  |   0m 16s | 
[/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/3/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javac  |   0m 16s | 
[/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/3/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 10s | 
[/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/3/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt)
 |  hadoop-tools/hadoop-azure: The patch generated 17 new + 2 unchanged - 1 
fixed = 19 total (was 3)  |
   | -1 :x: |  mvnsite  |   0m 17s | 
[/patch-mvnsite-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/3/artifact/out/patch-mvnsite-hadoop-tools_hadoop-azure.txt)
 |  hadoop-azure in the patch failed.  |
   | -1 :x: |  javadoc  |   0m 20s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/3/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050106#comment-18050106
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664749342


##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -1027,11 +1129,14 @@ private void 
assertReadTypeInClientRequestId(AzureBlobFileSystem fs, int numOfRe
 ArgumentCaptor captor8 = 
ArgumentCaptor.forClass(ContextEncryptionAdapter.class);
 ArgumentCaptor captor9 = 
ArgumentCaptor.forClass(TracingContext.class);
 
+List paths = captor1.getAllValues();
+System.out.println(paths);

Review Comment:
   remove the sysouts





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050103#comment-18050103
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664703108


##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -881,6 +881,15 @@ public void testReadTypeInTracingContextHeader() throws 
Exception {
 doReturn(false).when(spiedConfig).optimizeFooterRead();
 testReadTypeInTracingContextHeaderInternal(spiedFs, fileSize, 
SMALLFILE_READ, 1, totalReadCalls);
 
+/*
+ * Test to verify Random Read Type.
+ * Settin Read Policy to Parquet ensures Random Read Type.

Review Comment:
   nit: setting





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050101#comment-18050101
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664689188


##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -57,17 +58,16 @@
 import org.apache.hadoop.fs.azurebfs.utils.TracingHeaderVersion;
 import org.apache.hadoop.fs.impl.OpenFileParameters;
 
+import static org.apache.hadoop.fs.Options.OpenFileOptions.*;

Review Comment:
   "*" import should be reverted



##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -57,17 +58,16 @@
 import org.apache.hadoop.fs.azurebfs.utils.TracingHeaderVersion;
 import org.apache.hadoop.fs.impl.OpenFileParameters;
 
+import static org.apache.hadoop.fs.Options.OpenFileOptions.*;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_AVRO;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL;
 import static org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.COLON;
 import static 
org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.EMPTY_STRING;
 import static 
org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.SPLIT_NO_LIMIT;
 import static 
org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.FS_AZURE_ENABLE_PREFETCH_REQUEST_PRIORITY;
+import static 
org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ONE_MB;
 import static 
org.apache.hadoop.fs.azurebfs.constants.HttpHeaderConfigurations.X_MS_REQUEST_PRIORITY;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.DIRECT_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.FOOTER_READ;
-import static 
org.apache.hadoop.fs.azurebfs.constants.ReadType.MISSEDCACHE_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.NORMAL_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.PREFETCH_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.SMALLFILE_READ;
+import static org.apache.hadoop.fs.azurebfs.constants.ReadType.*;

Review Comment:
   same as above





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050100#comment-18050100
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664689188


##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -57,17 +58,16 @@
 import org.apache.hadoop.fs.azurebfs.utils.TracingHeaderVersion;
 import org.apache.hadoop.fs.impl.OpenFileParameters;
 
+import static org.apache.hadoop.fs.Options.OpenFileOptions.*;

Review Comment:
   * import should be reverted





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050099#comment-18050099
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664688584


##
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java:
##
@@ -57,17 +58,16 @@
 import org.apache.hadoop.fs.azurebfs.utils.TracingHeaderVersion;
 import org.apache.hadoop.fs.impl.OpenFileParameters;
 
+import static org.apache.hadoop.fs.Options.OpenFileOptions.*;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_AVRO;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL;
 import static org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.COLON;
 import static 
org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.EMPTY_STRING;
 import static 
org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.SPLIT_NO_LIMIT;
 import static 
org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.FS_AZURE_ENABLE_PREFETCH_REQUEST_PRIORITY;
+import static 
org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ONE_MB;
 import static 
org.apache.hadoop.fs.azurebfs.constants.HttpHeaderConfigurations.X_MS_REQUEST_PRIORITY;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.DIRECT_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.FOOTER_READ;
-import static 
org.apache.hadoop.fs.azurebfs.constants.ReadType.MISSEDCACHE_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.NORMAL_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.PREFETCH_READ;
-import static org.apache.hadoop.fs.azurebfs.constants.ReadType.SMALLFILE_READ;
+import static org.apache.hadoop.fs.azurebfs.constants.ReadType.*;

Review Comment:
* import should be reverted





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050098#comment-18050098
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664686966


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRandomInputStream.java:
##
@@ -0,0 +1,91 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+/**
+ * Input stream implementation optimized for random read patterns.
+ * This implementation disables prefetching of data blocks instead only
+ * reads ahead for a small range beyond what is requested by the caller.
+ */
+public class AbfsRandomInputStream extends AbfsInputStream {
+
+public AbfsRandomInputStream(
+final AbfsClient client,
+final FileSystem.Statistics statistics,
+final String path,
+final long contentLength,
+final AbfsInputStreamContext abfsInputStreamContext,
+final String eTag,
+TracingContext tracingContext) {
+super(client, statistics, path, contentLength,
+abfsInputStreamContext, eTag, tracingContext);
+}
+
+@Override

Review Comment:
   inherit doc 





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050097#comment-18050097
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664673459


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputPolicy.java:
##
@@ -0,0 +1,78 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.util.Locale;
+
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_COLUMNAR;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ORC;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_PARQUET;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_RANDOM;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_WHOLE_FILE;
+
+/**
+ * Enum for ABFS Input Policies.
+ * Each policy maps to a particular implementation of {@link AbfsInputStream}
+ */
+public enum AbfsInputPolicy {
+
+  SEQUENTIAL(FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL),
+  RANDOM(FS_OPTION_OPENFILE_READ_POLICY_RANDOM),
+  ADAPTIVE(FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE);
+
+  private final String policy;

Review Comment:
   same as above





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050095#comment-18050095
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664672572


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputPolicy.java:
##
@@ -0,0 +1,78 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.util.Locale;
+
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_COLUMNAR;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_ORC;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_PARQUET;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_RANDOM;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_SEQUENTIAL;
+import static 
org.apache.hadoop.fs.Options.OpenFileOptions.FS_OPTION_OPENFILE_READ_POLICY_WHOLE_FILE;
+
+/**
+ * Enum for ABFS Input Policies.
+ * Each policy maps to a particular implementation of {@link AbfsInputStream}
+ */
+public enum AbfsInputPolicy {

Review Comment:
   Should be ReadPolicy or input stream policy instead of input policy?





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050091#comment-18050091
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664646115


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsAdaptiveInputStream.java:
##
@@ -0,0 +1,109 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.azurebfs.services;
+
+import java.io.IOException;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.azurebfs.constants.ReadType;
+import org.apache.hadoop.fs.azurebfs.utils.TracingContext;
+
+import static java.lang.Math.max;
+
+/**
+ * Input stream implementation optimized for adaptive read patterns.
+ * This is the default implementation used for cases where user does not 
specify any input policy.
+ * It switches between sequential and random read optimizations based on the 
detected read pattern.
+ * It also keeps footer read and small file optimizations enabled.
+ */
+public class AbfsAdaptiveInputStream extends AbfsInputStream {
+
+  public AbfsAdaptiveInputStream(

Review Comment:
   nit: needs javadoc comment 





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050090#comment-18050090
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664641298


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java:
##
@@ -108,6 +109,7 @@ public final class FileSystemConfigurations {
   public static final long MAX_AZURE_BLOCK_SIZE = 256 * 1024 * 1024L; // 
changing default abfs blocksize to 256MB
   public static final String AZURE_BLOCK_LOCATION_HOST_DEFAULT = "localhost";
   public static final int DEFAULT_AZURE_LIST_MAX_RESULTS = 5000;
+  public static final String DEFAULT_FS_AZURE_READ_POLICY = 
FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE;

Review Comment:
   Add comment pointing to the file where all these policies are defined





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2026-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050089#comment-18050089
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anmolanmol1234 commented on code in PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#discussion_r2664641298


##
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java:
##
@@ -108,6 +109,7 @@ public final class FileSystemConfigurations {
   public static final long MAX_AZURE_BLOCK_SIZE = 256 * 1024 * 1024L; // 
changing default abfs blocksize to 256MB
   public static final String AZURE_BLOCK_LOCATION_HOST_DEFAULT = "localhost";
   public static final int DEFAULT_AZURE_LIST_MAX_RESULTS = 5000;
+  public static final String DEFAULT_FS_AZURE_READ_POLICY = 
FS_OPTION_OPENFILE_READ_POLICY_ADAPTIVE;

Review Comment:
   Add comment pointing to the file where all these policies are defined





> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2025-12-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048543#comment-18048543
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

hadoop-yetus commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3702070688

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  20m 52s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/2/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | -1 :x: |  spotbugs  |   0m 44s | 
[/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/2/artifact/out/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html)
 |  hadoop-tools/hadoop-azure in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  14m 38s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 15s | 
[/patch-mvninstall-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/2/artifact/out/patch-mvninstall-hadoop-tools_hadoop-azure.txt)
 |  hadoop-azure in the patch failed.  |
   | -1 :x: |  compile  |   0m 16s | 
[/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/2/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javac  |   0m 16s | 
[/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/2/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  compile  |   0m 16s | 
[/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/2/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javac  |   0m 16s | 
[/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/2/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-azure in the patch failed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04.  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 11s | 
[/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/2/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt)
 |  hadoop-tools/hadoop-azure: The patch generated 18 new + 2 unchanged - 1 
fixed = 20 total (was 3)  |
   | -1 :x: |  mvnsite  |   0m 16s | 
[/patch-mvnsite-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/2/artifact/out/patch-mvnsite-hadoop-tools_hadoop-azure.txt)
 |  hadoop-azure in the patch failed.  |
   | -1 :x: |  javadoc  |   0m 20s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2025-12-31 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048521#comment-18048521
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

ahmarsuhail commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3701812349

   @anujmodi2021 I am trying to propose a single optimised implementation of an 
input stream across cloud implementations, as I think we all need this kind of 
logic. Ideally I want to get to a place where 80% of the logic is shared in a 
common layer, and then we only implement cloud specific clients to actually 
make the requests separately. 
   
   There is some consensus to move the shared logic into the parquet-java repo: 
https://lists.apache.org/thread/nbksq32cs8h1ldj8762y6wh9zzp8gqx6 , and some 
buy-in from the team at google. I'll be following up on this in the new year. 
   
   Would be great to get your thoughts and if your team would also like to 
collaborate on this.




> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2025-12-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048507#comment-18048507
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

hadoop-yetus commented on PR #8153:
URL: https://github.com/apache/hadoop/pull/8153#issuecomment-3701615039

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   8m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  22m 15s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   0m 27s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 24s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  checkstyle  |   0m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 22s |  |  trunk passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | -1 :x: |  spotbugs  |   0m 44s | 
[/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/1/artifact/out/branch-spotbugs-hadoop-tools_hadoop-azure-warnings.html)
 |  hadoop-tools/hadoop-azure in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  14m 36s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 19s |  |  the patch passed with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 10s | 
[/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/1/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt)
 |  hadoop-tools/hadoop-azure: The patch generated 28 new + 2 unchanged - 1 
fixed = 30 total (was 3)  |
   | +1 :green_heart: |  mvnsite  |   0m 21s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 16s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/1/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-tools_hadoop-azure-jdkUbuntu-21.0.7+6-Ubuntu-0ubuntu120.04 with JDK 
Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 generated 37 new + 1585 unchanged - 0 
fixed = 1622 total (was 1585)  |
   | -1 :x: |  javadoc  |   0m 15s | 
[/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/1/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-tools_hadoop-azure-jdkUbuntu-17.0.15+6-Ubuntu-0ubuntu120.04 with JDK 
Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04 generated 37 new + 1464 unchanged - 0 
fixed = 1501 total (was 1464)  |
   | -1 :x: |  spotbugs  |   0m 42s | 
[/new-spotbugs-hadoop-tools_hadoop-azure.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8153/1/artifact/out/new-spotbugs-hadoop-tools_hadoop-azure.html)
 |  hadoop-tools/hadoop-azure generated 2 new + 1 unchanged - 0 fixed = 3 total 
(was 1)  |
   | +1 :green_heart: |  shadedclient  |  14m 12s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 11s |  |  hadoop-azure in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 22s |  |  The patch does not 
generate ASF License warnings.  |
   |  |  

[jira] [Commented] (HADOOP-19767) ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns

2025-12-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048500#comment-18048500
 ] 

ASF GitHub Bot commented on HADOOP-19767:
-

anujmodi2021 opened a new pull request, #8153:
URL: https://github.com/apache/hadoop/pull/8153

   ### Description of PR
   Since the onset of ABFS Driver, there has been a single implementation of 
AbfsInputStream. Different kinds of workloads require different heuristics to 
give the best performance for that type of workload. For example: 
   
   Sequential Read Workloads like DFSIO and DistCP gain performance improvement 
from prefetched 
   Random Read Workloads on other hand do not need Prefetches and enabling 
prefetches for them is an overhead and TPS heavy 
   Query Workloads involving Parquet/ORC files benefit from improvements like 
Footer Read and Small Files Reads
   
   To accomodate this we need to determine the pattern and accordingly create 
Input Streams implemented for that particular pattern.
   
   https://github.com/user-attachments/assets/5b7a3db9-ab04-43cf-b44e-5e7a6582205f";
 />
   
   Moving ahead more relevant policies and specialized implementation of 
AbfsInputStream can be added.
   
   This PR only refactors the way we create input streams. No logical change 
introduced. As today by default we will continue to use AbfsAdaptiveInputStream 
which can cater to all kind of workloads.
   
   ### How was this patch tested?
   New tests were added.
   




> ABFS: [Read] Introduce Abfs Input Policy for detecting read patterns
> 
>
> Key: HADOOP-19767
> URL: https://issues.apache.org/jira/browse/HADOOP-19767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.2
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>
> Since the onset of ABFS Driver, there has been a single implementation of 
> AbfsInputStream. Different kinds of workloads require different heuristics to 
> give the best performance for that type of workload. For example: 
>  # Sequential Read Workloads like DFSIO and DistCP gain performance 
> improvement from prefetched 
>  # Random Read Workloads on other hand do not need Prefetches and enabling 
> prefetches for them is an overhead and TPS heavy 
>  # Query Workloads involving Parquet/ORC files benefit from improvements like 
> Footer Read and Small Files Reads
> To accomodate this we need to determine the pattern and accordingly create 
> Input Streams implemented for that particular pattern.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]