Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-26 Thread via GitHub


danny0405 merged PR #10976:
URL: https://github.com/apache/hudi/pull/10976


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-25 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2077346812

   
   ## CI report:
   
   * 7cd1f8fbf0b4ec1496866cd9d5dfd113bca0b725 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23488)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-25 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2077233748

   
   ## CI report:
   
   * fa5db11f49dbf6e5caa9713d8860d076f86ac8e6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23461)
 
   * 7cd1f8fbf0b4ec1496866cd9d5dfd113bca0b725 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23488)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-25 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2077213055

   
   ## CI report:
   
   * fa5db11f49dbf6e5caa9713d8860d076f86ac8e6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23461)
 
   * 7cd1f8fbf0b4ec1496866cd9d5dfd113bca0b725 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-25 Thread via GitHub


the-other-tim-brown commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1579440430


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java:
##
@@ -540,10 +542,10 @@ private Option 
getLastOrFirstPendingClusterInstant(boolean isLast
 
   @Override
   public boolean isPendingClusterInstant(String instantTime) {
-HoodieTimeline potentialTimeline = 
getCommitsTimeline().filterPendingReplaceTimeline().filter(i -> 
i.getTimestamp().equals(instantTime));
-if (potentialTimeline.countInstants() == 0) {
+if (!getOrCreatePendingReplaceInstantSet().contains(instantTime)) {
   return false;
 }
+HoodieTimeline potentialTimeline = 
getCommitsTimeline().filterPendingReplaceTimeline().filter(i -> 
i.getTimestamp().equals(instantTime));
 if (potentialTimeline.countInstants() > 1) {
   throw new IllegalStateException("Multiple instants with same timestamp: 
" + potentialTimeline);

Review Comment:
   Makes sense, I've moved that and the check to see if the replace commit is 
actually a clustering commit into the caching step



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-24 Thread via GitHub


danny0405 commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1578657154


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java:
##
@@ -540,10 +542,10 @@ private Option 
getLastOrFirstPendingClusterInstant(boolean isLast
 
   @Override
   public boolean isPendingClusterInstant(String instantTime) {
-HoodieTimeline potentialTimeline = 
getCommitsTimeline().filterPendingReplaceTimeline().filter(i -> 
i.getTimestamp().equals(instantTime));
-if (potentialTimeline.countInstants() == 0) {
+if (!getOrCreatePendingReplaceInstantSet().contains(instantTime)) {
   return false;
 }
+HoodieTimeline potentialTimeline = 
getCommitsTimeline().filterPendingReplaceTimeline().filter(i -> 
i.getTimestamp().equals(instantTime));
 if (potentialTimeline.countInstants() > 1) {
   throw new IllegalStateException("Multiple instants with same timestamp: 
" + potentialTimeline);

Review Comment:
   When multiple same instant time instant are on the timeline: 
`potentialTimeline.countInstants() > 1`, this check is slow, let's improve it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-24 Thread via GitHub


the-other-tim-brown commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1578655857


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java:
##
@@ -540,10 +542,10 @@ private Option 
getLastOrFirstPendingClusterInstant(boolean isLast
 
   @Override
   public boolean isPendingClusterInstant(String instantTime) {
-HoodieTimeline potentialTimeline = 
getCommitsTimeline().filterPendingReplaceTimeline().filter(i -> 
i.getTimestamp().equals(instantTime));
-if (potentialTimeline.countInstants() == 0) {
+if (!getOrCreatePendingReplaceInstantSet().contains(instantTime)) {
   return false;
 }
+HoodieTimeline potentialTimeline = 
getCommitsTimeline().filterPendingReplaceTimeline().filter(i -> 
i.getTimestamp().equals(instantTime));
 if (potentialTimeline.countInstants() > 1) {
   throw new IllegalStateException("Multiple instants with same timestamp: 
" + potentialTimeline);

Review Comment:
   What do you mean by duplicate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-24 Thread via GitHub


danny0405 commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1578654464


##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java:
##
@@ -540,10 +542,10 @@ private Option 
getLastOrFirstPendingClusterInstant(boolean isLast
 
   @Override
   public boolean isPendingClusterInstant(String instantTime) {
-HoodieTimeline potentialTimeline = 
getCommitsTimeline().filterPendingReplaceTimeline().filter(i -> 
i.getTimestamp().equals(instantTime));
-if (potentialTimeline.countInstants() == 0) {
+if (!getOrCreatePendingReplaceInstantSet().contains(instantTime)) {
   return false;
 }
+HoodieTimeline potentialTimeline = 
getCommitsTimeline().filterPendingReplaceTimeline().filter(i -> 
i.getTimestamp().equals(instantTime));
 if (potentialTimeline.countInstants() > 1) {
   throw new IllegalStateException("Multiple instants with same timestamp: 
" + potentialTimeline);

Review Comment:
   Maybe we can store the cache as `instant time -> duplicate instant time 
count`, so that we can always have a `O1` query for the check.
   
   And to be clear, the cache should only cache pending clustering instant, so 
`ClusteringUtils.isClusteringInstant` should be used in the filter of 
`getOrCreatePendingReplaceInstantSet`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-24 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2076027839

   
   ## CI report:
   
   * fa5db11f49dbf6e5caa9713d8860d076f86ac8e6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23461)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-24 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2075899432

   
   ## CI report:
   
   * 641e4e1885d174370cc7a4e438cc67a486a36b04 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23358)
 
   * fa5db11f49dbf6e5caa9713d8860d076f86ac8e6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23461)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-24 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2075890989

   
   ## CI report:
   
   * 641e4e1885d174370cc7a4e438cc67a486a36b04 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23358)
 
   * fa5db11f49dbf6e5caa9713d8860d076f86ac8e6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-24 Thread via GitHub


the-other-tim-brown commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2075838607

   @danny0405 let me know what you think of the update


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-18 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2065732532

   
   ## CI report:
   
   * 641e4e1885d174370cc7a4e438cc67a486a36b04 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23358)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-18 Thread via GitHub


danny0405 commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1571765687


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##
@@ -140,6 +141,22 @@ protected void init(HoodieTableMetaClient metaClient, 
HoodieTimeline visibleActi
*/
   protected void refreshTimeline(HoodieTimeline visibleActiveTimeline) {
 this.visibleCommitsAndCompactionTimeline = 
visibleActiveTimeline.getWriteTimeline();
+this.timelineHashAndPendingReplaceInstants = null;
+  }
+
+  /**
+   * Get a list of pending replace instants. Caches the result for the active 
timeline.
+   * The cache is invalidated when {@link #refreshTimeline(HoodieTimeline)} is 
called.
+   *
+   * @return list of pending replace instant timestamps
+   */
+  private List getPendingReplaceInstants() {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();

Review Comment:
   The cache is located in `HoodieDefaultTimeline`, both variable `instants` 
and `instantTimeSet` are lazy initialized cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-18 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2065675751

   
   ## CI report:
   
   * db99bbcc7ede1bb1372a7996c25cfb54c1069a49 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23144)
 
   * 641e4e1885d174370cc7a4e438cc67a486a36b04 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23358)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-18 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2065671114

   
   ## CI report:
   
   * db99bbcc7ede1bb1372a7996c25cfb54c1069a49 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23144)
 
   * 641e4e1885d174370cc7a4e438cc67a486a36b04 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-18 Thread via GitHub


the-other-tim-brown commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1571665073


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##
@@ -140,6 +141,22 @@ protected void init(HoodieTableMetaClient metaClient, 
HoodieTimeline visibleActi
*/
   protected void refreshTimeline(HoodieTimeline visibleActiveTimeline) {
 this.visibleCommitsAndCompactionTimeline = 
visibleActiveTimeline.getWriteTimeline();
+this.timelineHashAndPendingReplaceInstants = null;
+  }
+
+  /**
+   * Get a list of pending replace instants. Caches the result for the active 
timeline.
+   * The cache is invalidated when {@link #refreshTimeline(HoodieTimeline)} is 
called.
+   *
+   * @return list of pending replace instant timestamps
+   */
+  private List getPendingReplaceInstants() {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();

Review Comment:
   @danny0405 I looked at the `ActiveTimeline` class and the `DefaultTimeline` 
class but don't see any instances where there is a cache of instants for a 
particular action type. Can you point me in the right direction?
   
   For this change, it is important that it is a `Set` in the end so we 
don't need to keep recreating this set on each iteration.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub


danny0405 commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1556736368


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##
@@ -140,6 +141,22 @@ protected void init(HoodieTableMetaClient metaClient, 
HoodieTimeline visibleActi
*/
   protected void refreshTimeline(HoodieTimeline visibleActiveTimeline) {
 this.visibleCommitsAndCompactionTimeline = 
visibleActiveTimeline.getWriteTimeline();
+this.timelineHashAndPendingReplaceInstants = null;
+  }
+
+  /**
+   * Get a list of pending replace instants. Caches the result for the active 
timeline.
+   * The cache is invalidated when {@link #refreshTimeline(HoodieTimeline)} is 
called.
+   *
+   * @return list of pending replace instant timestamps
+   */
+  private List getPendingReplaceInstants() {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();

Review Comment:
   > Can't multiple threads access the same timeline?
   
   It could, and we should introduce some synchronized code guard for the 
access of the cache, we already did that for some caches in the timeline.
   
   > What do you mean by "map cache"?
   
   My typo, it's the "Pair" cache here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub


the-other-tim-brown commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1556695938


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##
@@ -140,6 +141,22 @@ protected void init(HoodieTableMetaClient metaClient, 
HoodieTimeline visibleActi
*/
   protected void refreshTimeline(HoodieTimeline visibleActiveTimeline) {
 this.visibleCommitsAndCompactionTimeline = 
visibleActiveTimeline.getWriteTimeline();
+this.timelineHashAndPendingReplaceInstants = null;
+  }
+
+  /**
+   * Get a list of pending replace instants. Caches the result for the active 
timeline.
+   * The cache is invalidated when {@link #refreshTimeline(HoodieTimeline)} is 
called.
+   *
+   * @return list of pending replace instant timestamps
+   */
+  private List getPendingReplaceInstants() {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();

Review Comment:
   > > It seems like it may make sense long term to return the same instance 
whenever possible to benefit from this cache.
   > 
   > There should be no much difference because the map cache you use also has 
per-timeline granularity. The benefit to move to the timeline itself is for 
better maintainance.
   > 
   What do you mean by "map cache"?
   
   > And if we move the cache inside the timeline, there should not be thread 
access conflicts.
   
   Why is that? Can't multiple threads access the same timeline?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub


danny0405 commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1556677633


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##
@@ -140,6 +141,22 @@ protected void init(HoodieTableMetaClient metaClient, 
HoodieTimeline visibleActi
*/
   protected void refreshTimeline(HoodieTimeline visibleActiveTimeline) {
 this.visibleCommitsAndCompactionTimeline = 
visibleActiveTimeline.getWriteTimeline();
+this.timelineHashAndPendingReplaceInstants = null;
+  }
+
+  /**
+   * Get a list of pending replace instants. Caches the result for the active 
timeline.
+   * The cache is invalidated when {@link #refreshTimeline(HoodieTimeline)} is 
called.
+   *
+   * @return list of pending replace instant timestamps
+   */
+  private List getPendingReplaceInstants() {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();

Review Comment:
   > It seems like it may make sense long term to return the same instance 
whenever possible to benefit from this cache.
   
   There should be no much difference because the map cache you use also has 
per-timeline granularity. The benefit to move to the timeline itself is for 
better maintainance.
   
   And if we move the cache inside the timeline, there should not be thread 
access conflicts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub


the-other-tim-brown commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1555936753


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##
@@ -140,6 +141,22 @@ protected void init(HoodieTableMetaClient metaClient, 
HoodieTimeline visibleActi
*/
   protected void refreshTimeline(HoodieTimeline visibleActiveTimeline) {
 this.visibleCommitsAndCompactionTimeline = 
visibleActiveTimeline.getWriteTimeline();
+this.timelineHashAndPendingReplaceInstants = null;
+  }
+
+  /**
+   * Get a list of pending replace instants. Caches the result for the active 
timeline.
+   * The cache is invalidated when {@link #refreshTimeline(HoodieTimeline)} is 
called.
+   *
+   * @return list of pending replace instant timestamps
+   */
+  private List getPendingReplaceInstants() {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();

Review Comment:
   Regarding threading, should we just make this whole method synchronized?
   
   Regarding caching, I'm open to whatever seems best. I noticed that the cache 
in HoodieDefaultTimeline is limited to a single instance and not a global 
cache. It seems like it may make sense long term to return the same instance 
whenever possible to benefit from this cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub


danny0405 commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1555492722


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##
@@ -140,6 +141,22 @@ protected void init(HoodieTableMetaClient metaClient, 
HoodieTimeline visibleActi
*/
   protected void refreshTimeline(HoodieTimeline visibleActiveTimeline) {
 this.visibleCommitsAndCompactionTimeline = 
visibleActiveTimeline.getWriteTimeline();
+this.timelineHashAndPendingReplaceInstants = null;
+  }
+
+  /**
+   * Get a list of pending replace instants. Caches the result for the active 
timeline.
+   * The cache is invalidated when {@link #refreshTimeline(HoodieTimeline)} is 
called.
+   *
+   * @return list of pending replace instant timestamps
+   */
+  private List getPendingReplaceInstants() {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();

Review Comment:
   And do you think we should move the cache into the timeline itself, we 
already introduced some caches there for speeding up, check the 
`HoodieDefaultTimeline` for the cache.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub


danny0405 commented on code in PR #10976:
URL: https://github.com/apache/hudi/pull/10976#discussion_r1555490080


##
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##
@@ -140,6 +141,22 @@ protected void init(HoodieTableMetaClient metaClient, 
HoodieTimeline visibleActi
*/
   protected void refreshTimeline(HoodieTimeline visibleActiveTimeline) {
 this.visibleCommitsAndCompactionTimeline = 
visibleActiveTimeline.getWriteTimeline();
+this.timelineHashAndPendingReplaceInstants = null;
+  }
+
+  /**
+   * Get a list of pending replace instants. Caches the result for the active 
timeline.
+   * The cache is invalidated when {@link #refreshTimeline(HoodieTimeline)} is 
called.
+   *
+   * @return list of pending replace instant timestamps
+   */
+  private List getPendingReplaceInstants() {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();

Review Comment:
   Does this code work for multi-thread? Guess the read lock would ensure the 
intergity, but anyway the variable assignment should be atomic.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-08 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2041950375

   
   ## CI report:
   
   * db99bbcc7ede1bb1372a7996c25cfb54c1069a49 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23144)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2041837370

   
   ## CI report:
   
   * db99bbcc7ede1bb1372a7996c25cfb54c1069a49 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23144)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-07 Thread via GitHub


hudi-bot commented on PR #10976:
URL: https://github.com/apache/hudi/pull/10976#issuecomment-2041804155

   
   ## CI report:
   
   * db99bbcc7ede1bb1372a7996c25cfb54c1069a49 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7575] avoid repeated fetching of pending replace instants [hudi]

2024-04-07 Thread via GitHub


the-other-tim-brown opened a new pull request, #10976:
URL: https://github.com/apache/hudi/pull/10976

   ### Change Logs
   
   Avoids repeatedly creating a timeline with pending replace instants when 
creating the FSView.
   
   ### Impact
   
   - Lowers the overhead of creating the FSView in terms of objects created
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org