Re: [PR] [MINOR] Fixed naming of methods in HoodieMetadataConfig [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11076:
URL: https://github.com/apache/hudi/pull/11076#issuecomment-2072523610

   
   ## CI report:
   
   * f597f95d19d0f09176efcb358f3d1980efc7f946 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23425)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7653] Refactor HoodieFileIndex for more flexibility [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11074:
URL: https://github.com/apache/hudi/pull/11074#issuecomment-2072523463

   
   ## CI report:
   
   * c45d96645dd48aff96aa199693937c2d99c1ace0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23424)
 
   * e32f1f8615fbf4452673e79138dc23fe1309a45a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23426)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7653] Refactor HoodieFileIndex for more flexibility [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11074:
URL: https://github.com/apache/hudi/pull/11074#issuecomment-2072493453

   
   ## CI report:
   
   * c45d96645dd48aff96aa199693937c2d99c1ace0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23424)
 
   * e32f1f8615fbf4452673e79138dc23fe1309a45a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7653] Refactor HoodieFileIndex for more flexibility (#11074)

2024-04-23 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new cb6eb6785fd [HUDI-7653] Refactor HoodieFileIndex for more flexibility 
(#11074)
cb6eb6785fd is described below

commit cb6eb6785fdeb88e66016a2b8c0c6e6fa184b309
Author: Vova Kolmakov 
AuthorDate: Tue Apr 23 23:09:08 2024 +0700

[HUDI-7653] Refactor HoodieFileIndex for more flexibility (#11074)

Created new abstract class `SparkBaseIndexSupport` with abstract methods
`getIndexName`, `isIndexAvailable`, `computeCandidateFileNames` and
`invalidateCaches` (to override it in descendants) and concrete methods
`getPrunedFileNames`, `getCandidateFiles` and `shouldReadInMemory`
(moved from HoodieFileIndex or XXXIndexSupport to reuse it in descendants).

-

Co-authored-by: Sagar Sumit 
---
 .../org/apache/hudi/ColumnStatsIndexSupport.scala  |  68 ++-
 .../org/apache/hudi/FunctionalIndexSupport.scala   | 121 +--
 .../scala/org/apache/hudi/HoodieFileIndex.scala| 128 +
 .../org/apache/hudi/RecordLevelIndexSupport.scala  |  48 +---
 .../org/apache/hudi/SparkBaseIndexSupport.scala| 108 +
 5 files changed, 243 insertions(+), 230 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala
index dc15a3e8c8c..238962b964c 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala
@@ -23,11 +23,10 @@ import org.apache.hudi.ColumnStatsIndexSupport._
 import org.apache.hudi.HoodieCatalystUtils.{withPersistedData, 
withPersistedDataset}
 import org.apache.hudi.HoodieConversionUtils.toScalaOption
 import org.apache.hudi.avro.model._
-import org.apache.hudi.client.common.HoodieSparkEngineContext
 import org.apache.hudi.common.config.HoodieMetadataConfig
 import org.apache.hudi.common.data.HoodieData
 import org.apache.hudi.common.function.SerializableFunction
-import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.common.model.{FileSlice, HoodieRecord}
 import org.apache.hudi.common.table.HoodieTableMetaClient
 import org.apache.hudi.common.util.BinaryUtil.toBytes
 import org.apache.hudi.common.util.ValidationUtils.checkState
@@ -36,7 +35,6 @@ import org.apache.hudi.common.util.hash.ColumnIndexID
 import org.apache.hudi.data.HoodieJavaRDD
 import org.apache.hudi.metadata.{HoodieMetadataPayload, HoodieTableMetadata, 
HoodieTableMetadataUtil, MetadataPartitionType}
 import org.apache.hudi.util.JFunction
-import org.apache.spark.api.java.JavaSparkContext
 import 
org.apache.spark.sql.HoodieUnsafeUtils.{createDataFrameFromInternalRows, 
createDataFrameFromRDD, createDataFrameFromRows}
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.util.DateTimeUtils
@@ -44,8 +42,10 @@ import org.apache.spark.sql.functions.col
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.{DataFrame, Row, SparkSession}
 import org.apache.spark.storage.StorageLevel
-
 import java.nio.ByteBuffer
+
+import org.apache.spark.sql.catalyst.expressions.Expression
+
 import scala.collection.JavaConverters._
 import scala.collection.immutable.TreeSet
 import scala.collection.mutable.ListBuffer
@@ -55,11 +55,8 @@ class ColumnStatsIndexSupport(spark: SparkSession,
   tableSchema: StructType,
   @transient metadataConfig: HoodieMetadataConfig,
   @transient metaClient: HoodieTableMetaClient,
-  allowCaching: Boolean = false) {
-
-  @transient private lazy val engineCtx = new HoodieSparkEngineContext(new 
JavaSparkContext(spark.sparkContext))
-  @transient private lazy val metadataTable: HoodieTableMetadata =
-HoodieTableMetadata.create(engineCtx, metadataConfig, 
metaClient.getBasePathV2.toString)
+  allowCaching: Boolean = false)
+  extends SparkBaseIndexSupport(spark, metadataConfig, metaClient) {
 
   @transient private lazy val cachedColumnStatsIndexViews: 
ParHashMap[Seq[String], DataFrame] = ParHashMap()
 
@@ -79,6 +76,40 @@ class ColumnStatsIndexSupport(spark: SparkSession,
 }
   }
 
+  override def getIndexName: String = ColumnStatsIndexSupport.INDEX_NAME
+
+  override def computeCandidateFileNames(fileIndex: HoodieFileIndex,
+ queryFilters: Seq[Expression],
+ queryReferencedColumns: Seq[String],
+ prunedPartitionsAndFileSlices: 

[jira] [Updated] (HUDI-7653) Refactor HoodieFileIndex for more flexibility

2024-04-23 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7653:
--
Labels: hudi-1.0.0-beta2 pull-request-available  (was: 
pull-request-available)

> Refactor HoodieFileIndex for more flexibility
> -
>
> Key: HUDI-7653
> URL: https://issues.apache.org/jira/browse/HUDI-7653
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Vova Kolmakov
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> Create hierarchy of IndexSupport that is usable without if-else branches, is 
> easy to extend with new types of indices and it works with Spark <3.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7653) Refactor HoodieFileIndex for more flexibility

2024-04-23 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7653:
--
Status: Patch Available  (was: In Progress)

> Refactor HoodieFileIndex for more flexibility
> -
>
> Key: HUDI-7653
> URL: https://issues.apache.org/jira/browse/HUDI-7653
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Vova Kolmakov
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
>
> Create hierarchy of IndexSupport that is usable without if-else branches, is 
> easy to extend with new types of indices and it works with Spark <3.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7653) Refactor HoodieFileIndex for more flexibility

2024-04-23 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7653:
--
Fix Version/s: 1.0.0

> Refactor HoodieFileIndex for more flexibility
> -
>
> Key: HUDI-7653
> URL: https://issues.apache.org/jira/browse/HUDI-7653
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Vova Kolmakov
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Create hierarchy of IndexSupport that is usable without if-else branches, is 
> easy to extend with new types of indices and it works with Spark <3.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7653) Refactor HoodieFileIndex for more flexibility

2024-04-23 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-7653.
-
Resolution: Done

> Refactor HoodieFileIndex for more flexibility
> -
>
> Key: HUDI-7653
> URL: https://issues.apache.org/jira/browse/HUDI-7653
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Vova Kolmakov
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> Create hierarchy of IndexSupport that is usable without if-else branches, is 
> easy to extend with new types of indices and it works with Spark <3.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7653] Refactor HoodieFileIndex for more flexibility [hudi]

2024-04-23 Thread via GitHub


codope merged PR #11074:
URL: https://github.com/apache/hudi/pull/11074


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-04-23 Thread via GitHub


codope opened a new pull request, #11077:
URL: https://github.com/apache/hudi/pull/11077

   ### Change Logs
   
   This PR introduces a new class hierarchy for handling merge keys in a more 
flexible and decoupled manner. It adds the `HoodieMergeKey` interface, along 
with two implementations: `HoodieSimpleMergeKey` and `HoodieCompositeMergeKey`. 
This design allows us to extend key-based merge strategies easily.
   
   **Motivation**
   
   The need for introducing a new merge key handling mechanism was driven by 
the requirement to support different types of keys (simple and complex) without 
overloading the existing HoodieKey class, which is central to the write path. 
By segregating merge key handling into its own hierarchy, we avoid potential 
conflicts and keep modifications localised, improving the maintainability of 
the code.
   
   **Changes**
   
   1. `HoodieMergeKey`: New API to ensure consistent handling including simple 
keys and composite keys. It includes methods for retrieving the key and 
partition path.
   2. `HoodieSimpleMergeKey`: Wraps `HoodieKey` and implements the 
`HoodieMergeKey` interface for simple scenarios where the key is a string.
   3. `HoodieCompositeMergeKey`: Implements the  `HoodieMergeKey` interface but 
allows for complex types as keys, enhancing flexibility for scenarios where a 
simple string key is not sufficient.
   4. `HoodieMergeKeyBasedRecordMerger`: A new implementation of 
`HoodieRecordMerger` based on `HoodieMergeKey`. If the merge keys are of type 
`HoodieCompositeMergeKey`, then it returns the older and newer records. 
Otherwise, it calls the merge method from the parent class.
   5. `HoodieMergedLogRecordScanner`: Changes to merge based on 
`HoodieMergeKey`.
   6. Unit tests for the new merger.
   
   These changes do not affect existing functionalities that do not rely on 
merge keys. It introduces additional classes that are used explicitly for new 
functionalities involving various key types in merging operations. This ensures 
minimal to no risk for existing processes.
   
   ### Impact
   
   Enhancing the flexibility and robustness of our key-based merge strategies. 
It helps in keeping our codebase scalable and maintainable, allowing easy 
extensions and modifications in the future.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7652) Add new MergeKey API to support simple and composite keys

2024-04-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7652:
-
Labels: hudi-1.0.0-beta2 pull-request-available  (was: hudi-1.0.0-beta2)

> Add new MergeKey API to support simple and composite keys
> -
>
> Key: HUDI-7652
> URL: https://issues.apache.org/jira/browse/HUDI-7652
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> Based on RFC- https://github.com/apache/hudi/pull/10814#discussion_r1567362323



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11077:
URL: https://github.com/apache/hudi/pull/11077#issuecomment-2072997601

   
   ## CI report:
   
   * 19a23e39e15d2818d28956959dc00f09bc51 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7146] [RFC-77] RFC for secondary index [hudi]

2024-04-23 Thread via GitHub


codope commented on code in PR #10814:
URL: https://github.com/apache/hudi/pull/10814#discussion_r1576637313


##
rfc/rfc-77/rfc-77.md:
##
@@ -0,0 +1,323 @@
+
+
+# RFC-77: Secondary Indexes
+
+## Proposers
+
+- @bhat-vinay
+- @codope
+
+## Approvers
+ - @vinothchandar
+ - @nsivabalan
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7146
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+In this RFC, we propose implementing Secondary Indexes (SI), a new capability 
in Hudi's metadata table (MDT) based indexing 
+system.  SI are indexes defined on user specified columns of the table. 
Similar to record level indexes,
+SI will improve query performance when the query predicate contains secondary 
keys. The number of files
+that a query needs to scan can be pruned down using secondary indexes.
+
+## Background
+
+Hudi supports different indexes through its MDT. These indexes help to improve 
query performance by
+pruning down the set of files that need to be scanned to build the result set 
(of the query). 
+
+One of the supported index in Hudi is the Record Level Index (RLI). RLI acts 
as a unique-key index and can be used to 
+locate a FileGroup of a record based on its RecordKey. A query having an EQUAL 
or IN predicate on the RecordKey will 
+have a performance boost as the RLI can accurately give a subset of FileGroups 
that contain the rows matching the 
+predicate.
+
+Many workloads have queries with predicates that are not based on RecordKey. 
Such queries cannot use RLI for data
+skipping. Traditional databases have a notion of building indexes (called 
Secondary Index or SI) on user specified 
+columns to aid such queries. This RFC proposes implementing SI in Hudi. Users 
can build SI on columns which are 
+frequently used as filtering columns (i.e columns on which query predicate is 
based on). As with any other index, 
+building and maintaining SI adds overhead on the write path. Users should 
choose wisely based 
+on their workload. Tools can be built to provide guidance on the usefulness of 
indexing a specific column, but it is 
+not in the scope of this RFC.
+
+## Design and Implementation
+This section discusses briefly the goals, design, implementation details of 
supporting SI in Hudi. At a high level,
+the design principle and goals are as follows:
+1. User specifies SI to be built on a given column of a table. A given SI can 
be built on only one column of the table
+(i.e composite keys are not allowed). Any number of SI can be built on a Hudi 
table. The indexes to be built are 
+specified using regular SQL statements.
+2. Metadata of a SI will be tracked through the index metadata file under 
`/.hoodie/.index` (this path can be configurable).
+3. Each SI will be a partition inside Hudi MDT. Index data will not be 
materialized with the base table's data files.
+4. Logical plan of a query will be used to efficiently filter FileGroups based 
on the query predicate and the available
+indexes.
+
+### SQL
+SI can be created using the regular `CREATE INDEX` SQL statement.
+```
+-- PROPOSED SYNTAX WITH `secondary_index` as the index type --
+CREATE INDEX [IF NOT EXISTS] index_name ON [TABLE] table_name [USING 
secondary_index](index_column)
+-- Examples --
+CREATE INDEX idx_city on hudi_table USING secondary_index(city)
+CREATE INDEX idx_last_name on hudi_table (last_name)
+
+-- NO CHANGE IN DROP INDEX --
+DROP INDEX idx_city;
+```
+
+`index_name` - Required and validated by parser. `index_name` will be used to 
derive the name of the physical partition
+in MDT by prefixing `secondary_index_`. If the `index_name` is `idx_city`, 
then the MDT partition will be 
+`secondary_index_idx_city`
+
+The index_type will be `secondary_index`. This will be used to distinguish SI 
from other Functional Indexes.
+
+### Secondary Index Metadata
+Secondary index metadata will be managed the same way as Functional Index 
metadata. Since SI will not have any function
+to be applied on each row, the `function_name` will be NULL.
+
+### Index in Metadata Table (MDT)
+Each SI will be stored as a physical partition in the MDT. The partition name 
is derived from the `index_name` by 
+prefixing `secondary_index_`. Each entry in the SI partition will be a mapping 
of the form 
+`secondary_key -> record_key`. `secondary_key` will form the "record key" for 
the record of the SI partition. Note that
+an important design consideration here is that users may choose to build SI on 
a non-unique column of the table.
+
+ Index Initialisation
+Initial build of the secondary index will scan all file slices (of the base 
table) to extract 
+`secondary-key -> record-key` tuple and write it into the secondary index 
partition in the metadata table. 
+This is similar to how RLI is initialised.
+
+ Index Maintenance
+The index needs to be updated on inserts, updates and deletes to the base 
table. Considering that secondary-keys in 
+the base table could be 

Re: [PR] [HUDI-7146] [RFC-77] RFC for secondary index [hudi]

2024-04-23 Thread via GitHub


codope commented on code in PR #10814:
URL: https://github.com/apache/hudi/pull/10814#discussion_r1576637885


##
rfc/rfc-77/rfc-77.md:
##
@@ -0,0 +1,323 @@
+
+
+# RFC-77: Secondary Indexes
+
+## Proposers
+
+- @bhat-vinay
+- @codope
+
+## Approvers
+ - @vinothchandar
+ - @nsivabalan
+
+## Status
+
+JIRA: https://issues.apache.org/jira/browse/HUDI-7146
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+In this RFC, we propose implementing Secondary Indexes (SI), a new capability 
in Hudi's metadata table (MDT) based indexing 
+system.  SI are indexes defined on user specified columns of the table. 
Similar to record level indexes,
+SI will improve query performance when the query predicate contains secondary 
keys. The number of files
+that a query needs to scan can be pruned down using secondary indexes.
+
+## Background
+
+Hudi supports different indexes through its MDT. These indexes help to improve 
query performance by
+pruning down the set of files that need to be scanned to build the result set 
(of the query). 
+
+One of the supported index in Hudi is the Record Level Index (RLI). RLI acts 
as a unique-key index and can be used to 
+locate a FileGroup of a record based on its RecordKey. A query having an EQUAL 
or IN predicate on the RecordKey will 
+have a performance boost as the RLI can accurately give a subset of FileGroups 
that contain the rows matching the 
+predicate.
+
+Many workloads have queries with predicates that are not based on RecordKey. 
Such queries cannot use RLI for data
+skipping. Traditional databases have a notion of building indexes (called 
Secondary Index or SI) on user specified 
+columns to aid such queries. This RFC proposes implementing SI in Hudi. Users 
can build SI on columns which are 
+frequently used as filtering columns (i.e columns on which query predicate is 
based on). As with any other index, 
+building and maintaining SI adds overhead on the write path. Users should 
choose wisely based 
+on their workload. Tools can be built to provide guidance on the usefulness of 
indexing a specific column, but it is 
+not in the scope of this RFC.
+
+## Design and Implementation
+This section discusses briefly the goals, design, implementation details of 
supporting SI in Hudi. At a high level,
+the design principle and goals are as follows:
+1. User specifies SI to be built on a given column of a table. A given SI can 
be built on only one column of the table
+(i.e composite keys are not allowed). Any number of SI can be built on a Hudi 
table. The indexes to be built are 
+specified using regular SQL statements.
+2. Metadata of a SI will be tracked through the index metadata file under 
`/.hoodie/.index` (this path can be configurable).
+3. Each SI will be a partition inside Hudi MDT. Index data will not be 
materialized with the base table's data files.
+4. Logical plan of a query will be used to efficiently filter FileGroups based 
on the query predicate and the available
+indexes.
+
+### SQL
+SI can be created using the regular `CREATE INDEX` SQL statement.
+```
+-- PROPOSED SYNTAX WITH `secondary_index` as the index type --
+CREATE INDEX [IF NOT EXISTS] index_name ON [TABLE] table_name [USING 
secondary_index](index_column)
+-- Examples --
+CREATE INDEX idx_city on hudi_table USING secondary_index(city)
+CREATE INDEX idx_last_name on hudi_table (last_name)
+
+-- NO CHANGE IN DROP INDEX --
+DROP INDEX idx_city;
+```
+
+`index_name` - Required and validated by parser. `index_name` will be used to 
derive the name of the physical partition
+in MDT by prefixing `secondary_index_`. If the `index_name` is `idx_city`, 
then the MDT partition will be 
+`secondary_index_idx_city`
+
+The index_type will be `secondary_index`. This will be used to distinguish SI 
from other Functional Indexes.
+
+### Secondary Index Metadata
+Secondary index metadata will be managed the same way as Functional Index 
metadata. Since SI will not have any function
+to be applied on each row, the `function_name` will be NULL.
+
+### Index in Metadata Table (MDT)
+Each SI will be stored as a physical partition in the MDT. The partition name 
is derived from the `index_name` by 
+prefixing `secondary_index_`. Each entry in the SI partition will be a mapping 
of the form 
+`secondary_key -> record_key`. `secondary_key` will form the "record key" for 
the record of the SI partition. Note that
+an important design consideration here is that users may choose to build SI on 
a non-unique column of the table.
+
+ Index Initialisation
+Initial build of the secondary index will scan all file slices (of the base 
table) to extract 
+`secondary-key -> record-key` tuple and write it into the secondary index 
partition in the metadata table. 
+This is similar to how RLI is initialised.
+
+ Index Maintenance
+The index needs to be updated on inserts, updates and deletes to the base 
table. Considering that secondary-keys in 
+the base table could be 

Re: [PR] [HUDI-7235] Fix checkpoint bug for S3/GCS Incremental Source [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #10336:
URL: https://github.com/apache/hudi/pull/10336#issuecomment-2071510989

   
   ## CI report:
   
   * de49a9da9db751d6fd6e0eaa1a750f8726a55018 UNKNOWN
   * 1b754dffcc5dc2f82c62de06ed9d037ac201d194 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23411)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix incorrect catch of ClassCastException using HoodieSparkKeyGeneratorFactory [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11062:
URL: https://github.com/apache/hudi/pull/11062#issuecomment-2071512201

   
   ## CI report:
   
   * f97bf7a9acdc086a5ada79c743b983c11947c3af UNKNOWN
   * a4faddba433d5e454cd409b2818cad6da4c46c32 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23412)
 
   * 2fe41f70ab0d295fe9b4a3b3e94387385a21e7d4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23417)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6386] Enable testArchivalWithMultiWriters back as they are passing [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #9085:
URL: https://github.com/apache/hudi/pull/9085#issuecomment-2071614427

   
   ## CI report:
   
   * c818a7209bc320f3248f6ef5ea28fbf7358ccb8e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23415)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix incorrect catch of ClassCastException using HoodieSparkKeyGeneratorFactory [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11062:
URL: https://github.com/apache/hudi/pull/11062#issuecomment-2071494457

   
   ## CI report:
   
   * f97bf7a9acdc086a5ada79c743b983c11947c3af UNKNOWN
   * a4faddba433d5e454cd409b2818cad6da4c46c32 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23412)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7639) Refactor HoodieFileIndex so that different indexes can be used via optimizer rules

2024-04-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7639:
-
Labels: pull-request-available  (was: )

> Refactor HoodieFileIndex so that different indexes can be used via optimizer 
> rules
> --
>
> Key: HUDI-7639
> URL: https://issues.apache.org/jira/browse/HUDI-7639
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Currently, `HoodieFileIndex` is responsible for partition pruning as well as 
> file skipping. All indexes are being used in 
> [lookupCandidateFilesInMetadataTable|https://github.com/apache/hudi/blob/b5b14f7d4fa6224a6674b021664b510c6ae8afb9/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala#L333]
>  method through if-else branches. This is not only hard to maintain as we add 
> more indexes, but also induces a static hierarchy. Instead, we need more 
> flexibility so that we can alter logical plan based on availability of 
> indexes. For partition pruning in Spark, we already have 
> [HoodiePruneFileSourcePartitions|https://github.com/apache/hudi/blob/b5b14f7d4fa6224a6674b021664b510c6ae8afb9/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodiePruneFileSourcePartitions.scala#L40]
>  rule but it is injected during the operator optimization batch and it does 
> not modify the result of the LogicalPlan. To be fully extensible, we should 
> be able to rewrite the LogicalPlan. We should be able to inject rules after 
> partition pruning after the operator optimization batch and before any CBO 
> rules that depend on stats. Spark provides 
> [injectPreCBORules|https://github.com/apache/spark/blob/6232085227ee2cc4e831996a1ac84c27868a1595/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala#L304]
>  API to do so, however it is only available in Spark 3.1.0 onwards.
> The goal of this ticket is to refactor index hierarchy and create new rules 
> such that Spark version < 3.1.0 still go via the old path, while later 
> versions can modify the plan using an appropriate index and inject as a 
> pre-CBO rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7639] Refactor HoodieFileIndex so that different indexes can be used via optimizer rules [hudi]

2024-04-23 Thread via GitHub


wombatu-kun opened a new pull request, #11074:
URL: https://github.com/apache/hudi/pull/11074

   ### Change Logs
   
   Task: https://issues.apache.org/jira/browse/HUDI-7639  
   
   Created new abstract class SparkBaseIndexSupport with abstract methods 
`getIndexName`, `isIndexAvailable`, `computeCandidateFileNames` and 
`invalidateCaches` (to override it in descendants) and concrete methods 
`getPrunedFileNames`, `getCandidateFiles` and `shouldReadInMemory` (moved from 
HoodieFileIndex or XXXIndexSupport to reuse it in descendants).
   
   Made `ColumnStatsIndexSupport`, `FunctionalIndexSupport` and 
`RecordLevelIndexSupport` classes extend `SparkBaseIndexSupport`. 
Implementation of `computeCandidateFileNames` was made from corresponding 
if-else branches of `HoodieFileIndex.lookupCandidateFilesInMetadataTable()`. 
Implementations of  `getIndexName`, `isIndexAvailable` are trivial. Real 
implementation of `invalidateCaches` exists only for `ColumnStatsIndexSupport`.
   
   Replaced 3 individual XXXIndexSupport fields with one list of 3 
SparkBaseIndexSupport items. The order of items is important: to preserve 
original behavior the order of indices must be: RecordLevel, Functional, 
ColStats.   
   
   `HoodieFileIndex.lookupCandidateFilesInMetadataTable()` is simplified to 
just looping through the list, checking each Index availability and (if so) 
computing pruned file names by XXXIndexSupport class.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Fix incorrect catch of ClassCastException using HoodieSparkKeyGeneratorFactory [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11062:
URL: https://github.com/apache/hudi/pull/11062#issuecomment-2071503255

   
   ## CI report:
   
   * f97bf7a9acdc086a5ada79c743b983c11947c3af UNKNOWN
   * a4faddba433d5e454cd409b2818cad6da4c46c32 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23412)
 
   * 2fe41f70ab0d295fe9b4a3b3e94387385a21e7d4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7639) Refactor HoodieFileIndex so that different indexes can be used via optimizer rules

2024-04-23 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov updated HUDI-7639:

Status: In Progress  (was: Open)

> Refactor HoodieFileIndex so that different indexes can be used via optimizer 
> rules
> --
>
> Key: HUDI-7639
> URL: https://issues.apache.org/jira/browse/HUDI-7639
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Vova Kolmakov
>Priority: Major
> Fix For: 1.0.0
>
>
> Currently, `HoodieFileIndex` is responsible for partition pruning as well as 
> file skipping. All indexes are being used in 
> [lookupCandidateFilesInMetadataTable|https://github.com/apache/hudi/blob/b5b14f7d4fa6224a6674b021664b510c6ae8afb9/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala#L333]
>  method through if-else branches. This is not only hard to maintain as we add 
> more indexes, but also induces a static hierarchy. Instead, we need more 
> flexibility so that we can alter logical plan based on availability of 
> indexes. For partition pruning in Spark, we already have 
> [HoodiePruneFileSourcePartitions|https://github.com/apache/hudi/blob/b5b14f7d4fa6224a6674b021664b510c6ae8afb9/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodiePruneFileSourcePartitions.scala#L40]
>  rule but it is injected during the operator optimization batch and it does 
> not modify the result of the LogicalPlan. To be fully extensible, we should 
> be able to rewrite the LogicalPlan. We should be able to inject rules after 
> partition pruning after the operator optimization batch and before any CBO 
> rules that depend on stats. Spark provides 
> [injectPreCBORules|https://github.com/apache/spark/blob/6232085227ee2cc4e831996a1ac84c27868a1595/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala#L304]
>  API to do so, however it is only available in Spark 3.1.0 onwards.
> The goal of this ticket is to refactor index hierarchy and create new rules 
> such that Spark version < 3.1.0 still go via the old path, while later 
> versions can modify the plan using an appropriate index and inject as a 
> pre-CBO rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [MINOR] Fix incorrect catch of ClassCastException using HoodieSparkKeyGeneratorFactory [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11062:
URL: https://github.com/apache/hudi/pull/11062#issuecomment-2071617519

   
   ## CI report:
   
   * f97bf7a9acdc086a5ada79c743b983c11947c3af UNKNOWN
   * a4faddba433d5e454cd409b2818cad6da4c46c32 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23412)
 
   * 2fe41f70ab0d295fe9b4a3b3e94387385a21e7d4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23417)
 
   * d99ea433d02161130f8ee6d0028319e009253a12 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7656] Disable a flaky test [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11078:
URL: https://github.com/apache/hudi/pull/11078#issuecomment-2073406841

   
   ## CI report:
   
   * ff7ab8d5c15cd2311b0cf0bd6eaa2f5061fc8db3 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23430)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7657] Disable a flaky test in deltastreamer [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11079:
URL: https://github.com/apache/hudi/pull/11079#issuecomment-2073406892

   
   ## CI report:
   
   * bd68c36702ebde586b9f57bf1d36c3751b91e61a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11077:
URL: https://github.com/apache/hudi/pull/11077#issuecomment-2073101141

   
   ## CI report:
   
   * 19a23e39e15d2818d28956959dc00f09bc51 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23428)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #10806:
URL: https://github.com/apache/hudi/pull/10806#issuecomment-2073287758

   
   ## CI report:
   
   * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN
   * 11c19fa8fd39ed058a4e3487c99c793610b61564 UNKNOWN
   * d9f583043f1a5ffd532d613b2ce95aa7a8fddc47 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23213)
 
   * b6faa0ddf78a193ed8cdb1ce8eb14ae49016a105 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23429)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11077:
URL: https://github.com/apache/hudi/pull/11077#issuecomment-2073013071

   
   ## CI report:
   
   * 19a23e39e15d2818d28956959dc00f09bc51 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23428)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7655) Support configuration for clean to fail execution if there is at least one file is marked as a failed delete

2024-04-23 Thread Krishen Bhan (Jira)
Krishen Bhan created HUDI-7655:
--

 Summary: Support configuration for clean to fail execution if 
there is at least one file is marked as a failed delete
 Key: HUDI-7655
 URL: https://issues.apache.org/jira/browse/HUDI-7655
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Krishen Bhan


When a HUDI clean plan is executed, any targeted file that was not confirmed as 
deleted (or non-existing) will be marked as a "failed delete". Although these 
failed deletes will be added to `.clean` metadata, if incremental clean is used 
then these files might not ever be picked up again as a future clean plan, 
unless a "full-scan" clean ends up being scheduled. In addition to leading to 
more files unnecessarily taking up storage space for longer, then can lead to 
the following dataset consistency issue for COW datasets:
 # Insert at C1 creates file group f1 in partition
 # Replacecommit at RC2 creates file group f2 in partition, and replaces f1
 # Any reader of partition that calls HUDI API (with or without using MDT) will 
recognize that f1 should be ignored, as it has been replaced. This is since RC2 
instant file is in active timeline
 # Some completed instants later an incremental clean is scheduled. It moves 
the "earliest commit to retain" to an time after instant time RC2, so it 
targets f1 for deletion. But during execution of the plan, it fails to delete 
f1.
 # An archive job eventually is triggered, and archives C1. Note that f1 is 
still in partition

At this point, any job/query that reads the aforementioned partition directly 
from the DFS file system calls (without directly using MDT FILES partition) 
will consider both f1 and f2 as valid file groups, since RC2 is no longer in 
active timeline. This is a data consistency issue, and will only be resolved if 
a "full-scan" clean is triggered and deletes f1.

This specific scenario can be avoided if the user can configure HUDI clean to 
fail execution of a clean plan unless all files are confirmed as deleted (or 
not existing in DFS already), "blocking" the clean. The next clean attempt will 
re-execute this existing plan, since clean plans cannot be "rolled back". 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7656) Disable TestCOWDataSource.testCopyOnWriteConcurrentUpdates

2024-04-23 Thread Lin Liu (Jira)
Lin Liu created HUDI-7656:
-

 Summary: Disable TestCOWDataSource.testCopyOnWriteConcurrentUpdates
 Key: HUDI-7656
 URL: https://issues.apache.org/jira/browse/HUDI-7656
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Lin Liu
Assignee: Lin Liu


This test is flaky.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7657] Disable a flaky test in deltastreamer [hudi]

2024-04-23 Thread via GitHub


linliu-code opened a new pull request, #11079:
URL: https://github.com/apache/hudi/pull/11079

   ### Change Logs
   
   Disable test: TestHoodieDeltaStreamer.testAutoGenerateRecordKeys
   
   ### Impact
   
   Less test coverage temporarily.
   
   ### Risk level (write none, low medium or high below)
   
   Low.
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7657) disable flaky: TestHoodieDeltaStreamer.testAutoGenerateRecordKeys

2024-04-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7657:
-
Labels: pull-request-available  (was: )

> disable flaky: TestHoodieDeltaStreamer.testAutoGenerateRecordKeys
> -
>
> Key: HUDI-7657
> URL: https://issues.apache.org/jira/browse/HUDI-7657
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #10806:
URL: https://github.com/apache/hudi/pull/10806#issuecomment-2073275371

   
   ## CI report:
   
   * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN
   * 11c19fa8fd39ed058a4e3487c99c793610b61564 UNKNOWN
   * d9f583043f1a5ffd532d613b2ce95aa7a8fddc47 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23213)
 
   * b6faa0ddf78a193ed8cdb1ce8eb14ae49016a105 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7656] Disable a flaky test [hudi]

2024-04-23 Thread via GitHub


linliu-code opened a new pull request, #11078:
URL: https://github.com/apache/hudi/pull/11078

   ### Change Logs
   
   Disable test: TestCOWDataSource.testCopyOnWriteConcurrentUpdates
   
   ### Impact
   
   Less coverage temporarily.
   
   ### Risk level (write none, low medium or high below)
   
   Low.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7656) Disable TestCOWDataSource.testCopyOnWriteConcurrentUpdates

2024-04-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7656:
-
Labels: pull-request-available  (was: )

> Disable TestCOWDataSource.testCopyOnWriteConcurrentUpdates
> --
>
> Key: HUDI-7656
> URL: https://issues.apache.org/jira/browse/HUDI-7656
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Lin Liu
>Assignee: Lin Liu
>Priority: Major
>  Labels: pull-request-available
>
> This test is flaky.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7657) disable flaky: TestHoodieDeltaStreamer.testAutoGenerateRecordKeys

2024-04-23 Thread Lin Liu (Jira)
Lin Liu created HUDI-7657:
-

 Summary: disable flaky: 
TestHoodieDeltaStreamer.testAutoGenerateRecordKeys
 Key: HUDI-7657
 URL: https://issues.apache.org/jira/browse/HUDI-7657
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Lin Liu
Assignee: Lin Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7656] Disable a flaky test [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11078:
URL: https://github.com/apache/hudi/pull/11078#issuecomment-2073394659

   
   ## CI report:
   
   * ff7ab8d5c15cd2311b0cf0bd6eaa2f5061fc8db3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7658) Log time taken when meta sync fails in stream sync

2024-04-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7658:
-
Labels: pull-request-available  (was: )

> Log time taken when meta sync fails in stream sync
> --
>
> Key: HUDI-7658
> URL: https://issues.apache.org/jira/browse/HUDI-7658
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
>
> Time is only printed in log statements on success, but it is useful to see 
> the log on failure as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7658] add time to meta sync failure log [hudi]

2024-04-23 Thread via GitHub


jonvex opened a new pull request, #11080:
URL: https://github.com/apache/hudi/pull/11080

   ### Change Logs
   
   log the time taken when meta sync fails
   
   ### Impact
   
   more consistent logging between success an failure
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7659) Update 0.14.0 release docs to call out that row writer w/ clustering is enabled by default

2024-04-23 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-7659:
-

 Summary: Update 0.14.0 release docs to call out that row writer w/ 
clustering is enabled by default
 Key: HUDI-7659
 URL: https://issues.apache.org/jira/browse/HUDI-7659
 Project: Apache Hudi
  Issue Type: Improvement
  Components: docs
Reporter: sivabalan narayanan


Update 0.14.0 release docs to call out that row writer w/ clustering is enabled 
by default

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


yihua opened a new pull request, #11081:
URL: https://github.com/apache/hudi/pull/11081

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7651) Add util methods for creating meta client

2024-04-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7651:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Add util methods for creating meta client
> -
>
> Key: HUDI-7651
> URL: https://issues.apache.org/jira/browse/HUDI-7651
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11081:
URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073709014

   
   ## CI report:
   
   * 3e1310ac3eceed725bc829bceb8a9dcbc81e4512 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23433)
 
   * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7596] Enable Jacoco code coverage report across multiple modules [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11073:
URL: https://github.com/apache/hudi/pull/11073#issuecomment-2073708946

   
   ## CI report:
   
   * 39c44a33eaae3bc17270cec93536ce727daacd98 UNKNOWN
   * c59ca7c5f11aad7435129f97904d8a2a6d958b03 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23423)
 
   * acdbe5f086b556febb77425596685670229451e7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7648] Refactor MetadataPartitionType so as to enahance reuse [hudi]

2024-04-23 Thread via GitHub


danny0405 commented on code in PR #11067:
URL: https://github.com/apache/hudi/pull/11067#discussion_r1577074792


##
hudi-common/src/main/java/org/apache/hudi/metadata/MetadataPartitionType.java:
##
@@ -70,6 +92,19 @@ public static Set getAllPartitionPaths() {
 .collect(Collectors.toSet());
   }
 
+  /**
+   * Returns the list of metadata partition types enabled based on the 
metadata config and table config.
+   */
+  public static List 
getEnabledPartitions(HoodieMetadataConfig metadataConfig, HoodieTableMetaClient 
metaClient) {
+List enabledTypes = new ArrayList<>(4);

Review Comment:
   Not sure whether we need a specific initial list length param.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7648] Refactor MetadataPartitionType so as to enahance reuse [hudi]

2024-04-23 Thread via GitHub


danny0405 commented on code in PR #11067:
URL: https://github.com/apache/hudi/pull/11067#discussion_r1577074559


##
hudi-common/src/main/java/org/apache/hudi/metadata/MetadataPartitionType.java:
##
@@ -18,30 +18,52 @@
 
 package org.apache.hudi.metadata;
 
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+
+import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collections;
 import java.util.List;
 import java.util.Set;
+import java.util.function.BiPredicate;
+import java.util.function.Predicate;
 import java.util.stream.Collectors;
 
 /**
  * Partition types for metadata table.
  */
 public enum MetadataPartitionType {
-  FILES(HoodieTableMetadataUtil.PARTITION_NAME_FILES, "files-"),
-  COLUMN_STATS(HoodieTableMetadataUtil.PARTITION_NAME_COLUMN_STATS, 
"col-stats-"),
-  BLOOM_FILTERS(HoodieTableMetadataUtil.PARTITION_NAME_BLOOM_FILTERS, 
"bloom-filters-"),
-  RECORD_INDEX(HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX, 
"record-index-"),
-  
FUNCTIONAL_INDEX(HoodieTableMetadataUtil.PARTITION_NAME_FUNCTIONAL_INDEX_PREFIX,
 "func-index-");
+  FILES(HoodieTableMetadataUtil.PARTITION_NAME_FILES, "files-",
+  HoodieMetadataConfig::enabled,
+  (metaClient, partitionType) -> 
metaClient.getTableConfig().isMetadataPartitionAvailable(partitionType)),
+  COLUMN_STATS(HoodieTableMetadataUtil.PARTITION_NAME_COLUMN_STATS, 
"col-stats-",
+  HoodieMetadataConfig::isColumnStatsIndexEnabled,
+  (metaClient, partitionType) -> 
metaClient.getTableConfig().isMetadataPartitionAvailable(partitionType)),
+  BLOOM_FILTERS(HoodieTableMetadataUtil.PARTITION_NAME_BLOOM_FILTERS, 
"bloom-filters-",
+  HoodieMetadataConfig::isBloomFilterIndexEnabled,
+  (metaClient, partitionType) -> 
metaClient.getTableConfig().isMetadataPartitionAvailable(partitionType)),
+  RECORD_INDEX(HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX, 
"record-index-",
+  HoodieMetadataConfig::isRecordIndexEnabled,
+  (metaClient, partitionType) -> 
metaClient.getTableConfig().isMetadataPartitionAvailable(partitionType)),
+  
FUNCTIONAL_INDEX(HoodieTableMetadataUtil.PARTITION_NAME_FUNCTIONAL_INDEX_PREFIX,
 "func-index-",
+  metadataConfig -> false, // no config for functional index, it is 
created using sql
+  (metaClient, partitionType) -> 
metaClient.getFunctionalIndexMetadata().isPresent());
 
   // Partition path in metadata table.
   private final String partitionPath;
   // FileId prefix used for all file groups in this partition.
   private final String fileIdPrefix;
+  private final Predicate isMetadataPartitionEnabled;

Review Comment:
   Can we add some comments to these two variables?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7596] Enable Jacoco code coverage report across multiple modules [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11073:
URL: https://github.com/apache/hudi/pull/11073#issuecomment-2073811600

   
   ## CI report:
   
   * 39c44a33eaae3bc17270cec93536ce727daacd98 UNKNOWN
   * acdbe5f086b556febb77425596685670229451e7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23436)
 
   * dda40c2705709bfa6df2556c490f4f84b0c04b51 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11081:
URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073811653

   
   ## CI report:
   
   * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN
   * be718668e54ed3235ec45dd2147cd514048b1945 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23434)
 
   * 694488f2df3181678a49d136170e2fd9729b45b4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7648] Refactor MetadataPartitionType so as to enahance reuse [hudi]

2024-04-23 Thread via GitHub


jonvex commented on code in PR #11067:
URL: https://github.com/apache/hudi/pull/11067#discussion_r1577049959


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -167,21 +167,15 @@ protected HoodieBackedTableMetadataWriter(Configuration 
hadoopConf,
 this.engineContext = engineContext;
 this.hadoopConf = new SerializableConfiguration(hadoopConf);
 this.metrics = Option.empty();
-this.enabledPartitionTypes = new ArrayList<>(4);

Review Comment:
   oh, wow we just hardcoded that!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


danny0405 commented on code in PR #11081:
URL: https://github.com/apache/hudi/pull/11081#discussion_r1577097199


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -584,6 +585,39 @@ public static HoodieTableMetaClient 
initTableAndGetMetaClient(Configuration hado
 return metaClient;
   }
 
+  /**
+   * @param conf file system configuration.
+   * @param basePath base path of the Hudi table.
+   * @return a new {@link HoodieTableMetaClient} instance.
+   */
+  public static HoodieTableMetaClient build(Configuration conf,
+String basePath) {

Review Comment:
   Don't think there is necessity to add three new builder methods, the 
original builder is more flexible to extend and there is no much gains to 
switch these new methods which also introduce bunden for maintainance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11081:
URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073828849

   
   ## CI report:
   
   * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN
   * be718668e54ed3235ec45dd2147cd514048b1945 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23434)
 
   * 694488f2df3181678a49d136170e2fd9729b45b4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23438)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7652) Add new MergeKey API to support simple and composite keys

2024-04-23 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7652:
--
Reviewers: Danny Chen, Ethan Guo

> Add new MergeKey API to support simple and composite keys
> -
>
> Key: HUDI-7652
> URL: https://issues.apache.org/jira/browse/HUDI-7652
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> Based on RFC- https://github.com/apache/hudi/pull/10814#discussion_r1567362323



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7632] Remove FileSystem usage in HoodieLogFormatWriter [hudi]

2024-04-23 Thread via GitHub


wombatu-kun opened a new pull request, #11082:
URL: https://github.com/apache/hudi/pull/11082

   ### Change Logs
   
   Removed FileSystem usage in HoodieLogFormatWriter by adding methods to 
HoodieStorage API `getDefaultBufferSize()`, `getDefaultReplication()`, 
`create(StoragePath path, boolean overwrite, Integer bufferSize, Short 
replication, Long sizeThreshold)` (with appropriate implementations in 
HoodieHadoopStorage) and use them in HoodieLogFormatWriter instead of using fs. 
Also fixed logging.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7632) Remove FileSystem usage in HoodieLogFormatWriter

2024-04-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7632:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Remove FileSystem usage in HoodieLogFormatWriter
> 
>
> Key: HUDI-7632
> URL: https://issues.apache.org/jira/browse/HUDI-7632
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10591#discussion_r1569173014



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11080:
URL: https://github.com/apache/hudi/pull/11080#issuecomment-2073591022

   
   ## CI report:
   
   * 0d9301d153f8878a582bfe973a0aaa60ae6b0af9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23432)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11081:
URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073659591

   
   ## CI report:
   
   * 3e1310ac3eceed725bc829bceb8a9dcbc81e4512 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11080:
URL: https://github.com/apache/hudi/pull/11080#issuecomment-2073659533

   
   ## CI report:
   
   * 0d9301d153f8878a582bfe973a0aaa60ae6b0af9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23432)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11081:
URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073701612

   
   ## CI report:
   
   * 3e1310ac3eceed725bc829bceb8a9dcbc81e4512 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23433)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7651) Add util methods for creating meta client

2024-04-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7651:

Story Points: 4  (was: 1)

> Add util methods for creating meta client
> -
>
> Key: HUDI-7651
> URL: https://issues.apache.org/jira/browse/HUDI-7651
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7651) Add util methods for creating meta client

2024-04-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7651:

Status: Patch Available  (was: In Progress)

> Add util methods for creating meta client
> -
>
> Key: HUDI-7651
> URL: https://issues.apache.org/jira/browse/HUDI-7651
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7651) Add util methods for creating meta client

2024-04-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7651:

Sprint: Sprint 2024-03-25

> Add util methods for creating meta client
> -
>
> Key: HUDI-7651
> URL: https://issues.apache.org/jira/browse/HUDI-7651
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7588) Replace hadoop Configuration with StorageConfiguration in hudi-common module

2024-04-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7588:

Status: Patch Available  (was: In Progress)

> Replace hadoop Configuration with StorageConfiguration in hudi-common module
> 
>
> Key: HUDI-7588
> URL: https://issues.apache.org/jira/browse/HUDI-7588
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7651) Add util methods for creating meta client

2024-04-23 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-7651:

Epic Link: HUDI-6243

> Add util methods for creating meta client
> -
>
> Key: HUDI-7651
> URL: https://issues.apache.org/jira/browse/HUDI-7651
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7648] Refactor MetadataPartitionType so as to enahance reuse [hudi]

2024-04-23 Thread via GitHub


danny0405 commented on code in PR #11067:
URL: https://github.com/apache/hudi/pull/11067#discussion_r1577076390


##
hudi-common/src/test/java/org/apache/hudi/metadata/TestMetadataPartitionType.java:
##
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.metadata;
+
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.model.HoodieFunctionalIndexMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.Option;
+
+import org.junit.jupiter.api.Test;
+import org.mockito.Mockito;
+
+import java.util.List;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Tests for {@link MetadataPartitionType}.
+ */
+public class TestMetadataPartitionType {
+
+  @Test
+  public void testPartitionEnabledByConfigOnly() {
+HoodieTableMetaClient metaClient = 
Mockito.mock(HoodieTableMetaClient.class);
+HoodieTableConfig tableConfig = Mockito.mock(HoodieTableConfig.class);
+
+// Simulate the configuration enabling FILES but the meta client not 
having it available (yet to initialize files partition)
+Mockito.when(metaClient.getTableConfig()).thenReturn(tableConfig);
+
Mockito.when(tableConfig.isMetadataPartitionAvailable(MetadataPartitionType.FILES)).thenReturn(false);
+
Mockito.when(metaClient.getFunctionalIndexMetadata()).thenReturn(Option.empty());
+HoodieMetadataConfig metadataConfig = 
HoodieMetadataConfig.newBuilder().enable(true).build();
+
+List enabledPartitions = 
MetadataPartitionType.getEnabledPartitions(metadataConfig, metaClient);
+
+// Verify FILES is enabled due to config
+assertEquals(1, enabledPartitions.size(), "Only one partition should be 
enabled");
+assertTrue(enabledPartitions.contains(MetadataPartitionType.FILES), "FILES 
should be enabled by config");
+  }
+
+  @Test
+  public void testPartitionAvailableByMetaClientOnly() {
+HoodieTableMetaClient metaClient = 
Mockito.mock(HoodieTableMetaClient.class);
+HoodieTableConfig tableConfig = Mockito.mock(HoodieTableConfig.class);
+
+// Simulate the meta client having RECORD_INDEX available but config not 
enabling it
+Mockito.when(metaClient.getTableConfig()).thenReturn(tableConfig);
+
Mockito.when(tableConfig.isMetadataPartitionAvailable(MetadataPartitionType.FILES)).thenReturn(true);

Review Comment:
   So the meta config speified by write config does not override the config 
from table config metadata, is that the case? Then how can a user disable this 
index type once they have enabled it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [MINOR] Fixe naming of methods in HoodieMetadataConfig (#11076)

2024-04-23 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new c17f50dbbfc [MINOR] Fixe naming of methods in HoodieMetadataConfig 
(#11076)
c17f50dbbfc is described below

commit c17f50dbbfcf4b32ca0790837c672e4fd2d54e85
Author: Vova Kolmakov 
AuthorDate: Wed Apr 24 08:05:39 2024 +0700

[MINOR] Fixe naming of methods in HoodieMetadataConfig (#11076)
---
 .../java/org/apache/hudi/config/HoodieWriteConfig.java |  2 +-
 .../hudi/table/action/index/RunIndexActionExecutor.java|  2 +-
 .../apache/hudi/testutils/HoodieJavaClientTestHarness.java |  2 +-
 .../hudi/testutils/HoodieSparkClientTestHarness.java   |  2 +-
 .../apache/hudi/common/config/HoodieMetadataConfig.java| 14 +-
 .../java/org/apache/hudi/metadata/BaseTableMetadata.java   |  4 ++--
 .../apache/hudi/metadata/HoodieBackedTableMetadata.java|  2 +-
 .../java/org/apache/hudi/metadata/HoodieTableMetadata.java |  2 +-
 .../org/apache/hudi/metadata/HoodieTableMetadataUtil.java  |  2 +-
 .../src/main/java/org/apache/hudi/source/FileIndex.java|  2 +-
 .../scala/org/apache/hudi/ColumnStatsIndexSupport.scala|  2 +-
 .../scala/org/apache/hudi/FunctionalIndexSupport.scala |  2 +-
 .../src/main/scala/org/apache/hudi/HoodieFileIndex.scala   |  2 +-
 .../scala/org/apache/hudi/RecordLevelIndexSupport.scala|  2 +-
 14 files changed, 19 insertions(+), 23 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
index 8c53b06d879..755074997cb 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
@@ -2508,7 +2508,7 @@ public class HoodieWriteConfig extends HoodieConfig {
   }
 
   public boolean isRecordIndexEnabled() {
-return metadataConfig.enableRecordIndex();
+return metadataConfig.isRecordIndexEnabled();
   }
 
   public int getRecordIndexMinFileGroupCount() {
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
index 09a9b153db1..1da3c0c4be2 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java
@@ -99,7 +99,7 @@ public class RunIndexActionExecutor extends 
BaseActionExecutor table, String instantTime) {
 super(context, config, table, instantTime);
 this.txnManager = new TransactionManager(config, 
table.getMetaClient().getStorage());
-if (config.getMetadataConfig().enableMetrics()) {
+if (config.getMetadataConfig().isMetricsEnabled()) {
   this.metrics = Option.of(new 
HoodieMetadataMetrics(config.getMetricsConfig()));
 } else {
   this.metrics = Option.empty();
diff --git 
a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/testutils/HoodieJavaClientTestHarness.java
 
b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/testutils/HoodieJavaClientTestHarness.java
index 96ac7444eca..74cc19ea875 100644
--- 
a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/testutils/HoodieJavaClientTestHarness.java
+++ 
b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/testutils/HoodieJavaClientTestHarness.java
@@ -251,7 +251,7 @@ public abstract class HoodieJavaClientTestHarness extends 
HoodieWriterClientTest
   }
 
   public void syncTableMetadata(HoodieWriteConfig writeConfig) {
-if (!writeConfig.getMetadataConfig().enabled()) {
+if (!writeConfig.getMetadataConfig().isEnabled()) {
   return;
 }
 // Open up the metadata table again, for syncing
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieSparkClientTestHarness.java
 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieSparkClientTestHarness.java
index 2c97e960779..284c08f7309 100644
--- 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieSparkClientTestHarness.java
+++ 
b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieSparkClientTestHarness.java
@@ -531,7 +531,7 @@ public abstract class HoodieSparkClientTestHarness extends 
HoodieWriterClientTes
   }
 
   public void syncTableMetadata(HoodieWriteConfig writeConfig) {
-if (!writeConfig.getMetadataConfig().enabled()) {
+if (!writeConfig.getMetadataConfig().isEnabled()) {
   return;
 }
 // Open up the metadata table again, for syncing
diff --git 

Re: [PR] [MINOR] Fixed naming of methods in HoodieMetadataConfig [hudi]

2024-04-23 Thread via GitHub


danny0405 merged PR #11076:
URL: https://github.com/apache/hudi/pull/11076


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]

2024-04-23 Thread via GitHub


danny0405 commented on code in PR #11077:
URL: https://github.com/apache/hudi/pull/11077#discussion_r1577086360


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMergeKey.java:
##
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.model;
+
+import java.io.Serializable;
+
+/**
+ * Defines a standard for all merge keys to ensure consistent handling 
including simple keys and composite keys.
+ * It includes methods for retrieving the key and partition path.
+ */
+public interface HoodieMergeKey extends Serializable {

Review Comment:
   Not sure why the record key needs to be bound to the partition path, because 
under global index, a key is only located under one partition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11081:
URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073803021

   
   ## CI report:
   
   * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN
   * be718668e54ed3235ec45dd2147cd514048b1945 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23434)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11070:
URL: https://github.com/apache/hudi/pull/11070#issuecomment-2073802830

   
   ## CI report:
   
   * c250cc04340a016a04878a7647d4b27a608e7374 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23403)
 
   * 1ce1316840852fa8e21363100f6ce695a5ecf0a7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23435)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7596] Enable Jacoco code coverage report across multiple modules [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11073:
URL: https://github.com/apache/hudi/pull/11073#issuecomment-2073802939

   
   ## CI report:
   
   * 39c44a33eaae3bc17270cec93536ce727daacd98 UNKNOWN
   * acdbe5f086b556febb77425596685670229451e7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23436)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7651) Add util methods for creating meta client

2024-04-23 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7651:
--
Reviewers: Sagar Sumit

> Add util methods for creating meta client
> -
>
> Key: HUDI-7651
> URL: https://issues.apache.org/jira/browse/HUDI-7651
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11080:
URL: https://github.com/apache/hudi/pull/11080#issuecomment-2073885067

   
   ## CI report:
   
   * 0d9301d153f8878a582bfe973a0aaa60ae6b0af9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23432)
 
   * f5e6a9914ed766ac6650513a04d12c3d3cea4407 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7656] Disable a flaky test [hudi]

2024-04-23 Thread via GitHub


linliu-code closed pull request #11078: [HUDI-7656] Disable a flaky test
URL: https://github.com/apache/hudi/pull/11078


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7657] Disable a flaky test in deltastreamer [hudi]

2024-04-23 Thread via GitHub


linliu-code closed pull request #11079: [HUDI-7657] Disable a flaky test in 
deltastreamer
URL: https://github.com/apache/hudi/pull/11079


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11080:
URL: https://github.com/apache/hudi/pull/11080#issuecomment-2073583735

   
   ## CI report:
   
   * 0d9301d153f8878a582bfe973a0aaa60ae6b0af9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]

2024-04-23 Thread via GitHub


yihua commented on code in PR #11080:
URL: https://github.com/apache/hudi/pull/11080#discussion_r1576980964


##
hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java:
##
@@ -1026,27 +1026,32 @@ public void runMetaSync() {
   Map failedMetaSyncs = new HashMap<>();
   for (String impl : syncClientToolClasses) {
 Timer.Context syncContext = metrics.getMetaSyncTimerContext();
-boolean success = false;
+HoodieMetaSyncException metaSyncException = null;

Review Comment:
   nit: use `Option`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11081:
URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073722586

   
   ## CI report:
   
   * 3e1310ac3eceed725bc829bceb8a9dcbc81e4512 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23433)
 
   * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN
   * be718668e54ed3235ec45dd2147cd514048b1945 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (HUDI-7596) Enable Jacoco code coverage report across multiple modules

2024-04-23 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-7596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838411#comment-17838411
 ] 

Danny Chen edited comment on HUDI-7596 at 4/24/24 12:38 AM:


The link

jacoco official maven plugin doc: [https://www.jacoco.org/jacoco/trunk/doc/]
jacoco multi module: 
[https://www.baeldung.com/maven-jacoco-multi-module-project]
jacoco and Azure CI: 
[https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/publish-code-coverage-results-v1?view=azure-pipelines]
jacoco and Azure YouTube: [https://www.youtube.com/watch?v=nflwvk2cJ2o]

report generator: 
https://marketplace.visualstudio.com/items?itemName=Palmmedia.reportgenerator
PublishTestResults@2 - Publish Test Results v2 task: 
[https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/publish-test-results-v2?view=azure-pipelines=trx%2Ctrxattachments%2Cyaml]


was (Author: danny0405):
The link

jacoco official maven plugin doc: [https://www.jacoco.org/jacoco/trunk/doc/]
jacoco multi module: 
[https://www.baeldung.com/maven-jacoco-multi-module-project]
jacoco and Azure CI: 
[https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/publish-code-coverage-results-v1?view=azure-pipelines]
jacoco and Azure YouTube: [https://www.youtube.com/watch?v=nflwvk2cJ2o]

PublishTestResults@2 - Publish Test Results v2 task: 
[https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/publish-test-results-v2?view=azure-pipelines=trx%2Ctrxattachments%2Cyaml]

> Enable Jacoco code coverage report across multiple modules
> --
>
> Key: HUDI-7596
> URL: https://issues.apache.org/jira/browse/HUDI-7596
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available, starter
> Fix For: 0.15.0, 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11070:
URL: https://github.com/apache/hudi/pull/11070#issuecomment-2073722347

   
   ## CI report:
   
   * c250cc04340a016a04878a7647d4b27a608e7374 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23403)
 
   * 1ce1316840852fa8e21363100f6ce695a5ecf0a7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7648] Refactor MetadataPartitionType so as to enahance reuse [hudi]

2024-04-23 Thread via GitHub


danny0405 commented on code in PR #11067:
URL: https://github.com/apache/hudi/pull/11067#discussion_r1577078330


##
hudi-common/src/test/java/org/apache/hudi/metadata/TestMetadataPartitionType.java:
##
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.metadata;
+
+import org.apache.hudi.common.config.HoodieMetadataConfig;
+import org.apache.hudi.common.model.HoodieFunctionalIndexMetadata;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.HoodieTableMetaClient;
+import org.apache.hudi.common.util.Option;
+
+import org.junit.jupiter.api.Test;
+import org.mockito.Mockito;
+
+import java.util.List;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+/**
+ * Tests for {@link MetadataPartitionType}.
+ */
+public class TestMetadataPartitionType {
+
+  @Test
+  public void testPartitionEnabledByConfigOnly() {
+HoodieTableMetaClient metaClient = 
Mockito.mock(HoodieTableMetaClient.class);
+HoodieTableConfig tableConfig = Mockito.mock(HoodieTableConfig.class);
+
+// Simulate the configuration enabling FILES but the meta client not 
having it available (yet to initialize files partition)
+Mockito.when(metaClient.getTableConfig()).thenReturn(tableConfig);
+
Mockito.when(tableConfig.isMetadataPartitionAvailable(MetadataPartitionType.FILES)).thenReturn(false);
+
Mockito.when(metaClient.getFunctionalIndexMetadata()).thenReturn(Option.empty());
+HoodieMetadataConfig metadataConfig = 
HoodieMetadataConfig.newBuilder().enable(true).build();
+
+List enabledPartitions = 
MetadataPartitionType.getEnabledPartitions(metadataConfig, metaClient);
+
+// Verify FILES is enabled due to config
+assertEquals(1, enabledPartitions.size(), "Only one partition should be 
enabled");
+assertTrue(enabledPartitions.contains(MetadataPartitionType.FILES), "FILES 
should be enabled by config");
+  }
+
+  @Test
+  public void testPartitionAvailableByMetaClientOnly() {
+HoodieTableMetaClient metaClient = 
Mockito.mock(HoodieTableMetaClient.class);
+HoodieTableConfig tableConfig = Mockito.mock(HoodieTableConfig.class);
+
+// Simulate the meta client having RECORD_INDEX available but config not 
enabling it
+Mockito.when(metaClient.getTableConfig()).thenReturn(tableConfig);
+
Mockito.when(tableConfig.isMetadataPartitionAvailable(MetadataPartitionType.FILES)).thenReturn(true);

Review Comment:
   Looks like once the user enable the metadata table with initial index type 
set up, they can never change it again unless the disable the whole metadata 
table functionality. That might need to be improved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] A bug when RocksDBDAO executes the prefixDelete function to delete the last entry [hudi]

2024-04-23 Thread via GitHub


danny0405 commented on issue #11075:
URL: https://github.com/apache/hudi/issues/11075#issuecomment-2073782944

   Is this a bug from real production use case or just a code reviewing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7596] Enable Jacoco code coverage report across multiple modules [hudi]

2024-04-23 Thread via GitHub


danny0405 commented on PR #11073:
URL: https://github.com/apache/hudi/pull/11073#issuecomment-2073797993

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


danny0405 commented on code in PR #11081:
URL: https://github.com/apache/hudi/pull/11081#discussion_r1577095513


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java:
##
@@ -157,7 +157,8 @@ public Option execute() {
 
   // reconcile with metadata table timeline
   String metadataBasePath = 
getMetadataTableBasePath(table.getMetaClient().getBasePathV2().toString());
-  HoodieTableMetaClient metadataMetaClient = 
HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build();
+  HoodieTableMetaClient metadataMetaClient =
+  HoodieTableMetaClient.build(hadoopConf, metadataBasePath);

Review Comment:
   Is this change necessary?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11070:
URL: https://github.com/apache/hudi/pull/11070#issuecomment-2073879258

   
   ## CI report:
   
   * 1ce1316840852fa8e21363100f6ce695a5ecf0a7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23435)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7596] Enable Jacoco code coverage report across multiple modules [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11073:
URL: https://github.com/apache/hudi/pull/11073#issuecomment-2073879293

   
   ## CI report:
   
   * 39c44a33eaae3bc17270cec93536ce727daacd98 UNKNOWN
   * acdbe5f086b556febb77425596685670229451e7 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23436)
 
   * dda40c2705709bfa6df2556c490f4f84b0c04b51 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23439)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11081:
URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073879336

   
   ## CI report:
   
   * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN
   * 694488f2df3181678a49d136170e2fd9729b45b4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23438)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11080:
URL: https://github.com/apache/hudi/pull/11080#issuecomment-2073891560

   
   ## CI report:
   
   * 0d9301d153f8878a582bfe973a0aaa60ae6b0af9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23432)
 
   * f5e6a9914ed766ac6650513a04d12c3d3cea4407 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23440)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #10806:
URL: https://github.com/apache/hudi/pull/10806#issuecomment-2073481386

   
   ## CI report:
   
   * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN
   * 11c19fa8fd39ed058a4e3487c99c793610b61564 UNKNOWN
   * b6faa0ddf78a193ed8cdb1ce8eb14ae49016a105 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23429)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7657] Disable a flaky test in deltastreamer [hudi]

2024-04-23 Thread via GitHub


hudi-bot commented on PR #11079:
URL: https://github.com/apache/hudi/pull/11079#issuecomment-2073482219

   
   ## CI report:
   
   * bd68c36702ebde586b9f57bf1d36c3751b91e61a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23431)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7658) Log time taken when meta sync fails in stream sync

2024-04-23 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler updated HUDI-7658:
--
Status: Patch Available  (was: In Progress)

> Log time taken when meta sync fails in stream sync
> --
>
> Key: HUDI-7658
> URL: https://issues.apache.org/jira/browse/HUDI-7658
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>
> Time is only printed in log statements on success, but it is useful to see 
> the log on failure as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7658) Log time taken when meta sync fails in stream sync

2024-04-23 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7658:
-

 Summary: Log time taken when meta sync fails in stream sync
 Key: HUDI-7658
 URL: https://issues.apache.org/jira/browse/HUDI-7658
 Project: Apache Hudi
  Issue Type: Improvement
  Components: deltastreamer
Reporter: Jonathan Vexler


Time is only printed in log statements on success, but it is useful to see the 
log on failure as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7658) Log time taken when meta sync fails in stream sync

2024-04-23 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler updated HUDI-7658:
--
Status: In Progress  (was: Open)

> Log time taken when meta sync fails in stream sync
> --
>
> Key: HUDI-7658
> URL: https://issues.apache.org/jira/browse/HUDI-7658
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>
> Time is only printed in log statements on success, but it is useful to see 
> the log on failure as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7658) Log time taken when meta sync fails in stream sync

2024-04-23 Thread Jonathan Vexler (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Vexler reassigned HUDI-7658:
-

Assignee: Jonathan Vexler

> Log time taken when meta sync fails in stream sync
> --
>
> Key: HUDI-7658
> URL: https://issues.apache.org/jira/browse/HUDI-7658
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: deltastreamer
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>
> Time is only printed in log statements on success, but it is useful to see 
> the log on failure as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7647] READ_UTC_TIMEZONE doesn't affect log files for MOR tables (#11066)

2024-04-23 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new ce0c2671a0a [HUDI-7647] READ_UTC_TIMEZONE doesn't affect log files for 
MOR tables (#11066)
ce0c2671a0a is described below

commit ce0c2671a0a5e010173e0e6caf9c21ca2f175a30
Author: Марк Бухнер <66881554+alowa...@users.noreply.github.com>
AuthorDate: Wed Apr 24 08:06:25 2024 +0700

[HUDI-7647] READ_UTC_TIMEZONE doesn't affect log files for MOR tables 
(#11066)
---
 .../hudi/source/stats/ColumnStatsIndices.java  |  2 +-
 .../table/format/mor/MergeOnReadInputFormat.java   |  8 ++---
 .../apache/hudi/util/AvroToRowDataConverters.java  | 42 +-
 .../apache/hudi/table/ITTestHoodieDataSource.java  | 31 
 4 files changed, 46 insertions(+), 37 deletions(-)

diff --git 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/ColumnStatsIndices.java
 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/ColumnStatsIndices.java
index 05931876603..7032f299368 100644
--- 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/ColumnStatsIndices.java
+++ 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/ColumnStatsIndices.java
@@ -272,7 +272,7 @@ public class ColumnStatsIndices {
   LogicalType logicalType,
   Map 
converters) {
 AvroToRowDataConverters.AvroToRowDataConverter converter =
-converters.computeIfAbsent(logicalType, k -> 
AvroToRowDataConverters.createConverter(logicalType));
+converters.computeIfAbsent(logicalType, k -> 
AvroToRowDataConverters.createConverter(logicalType, true));
 return converter.convert(rawVal);
   }
 
diff --git 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java
 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java
index 29bb0a06d8c..3690fc911d8 100644
--- 
a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java
+++ 
b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java
@@ -351,7 +351,7 @@ public class MergeOnReadInputFormat
 final Schema requiredSchema = new 
Schema.Parser().parse(tableState.getRequiredAvroSchema());
 final GenericRecordBuilder recordBuilder = new 
GenericRecordBuilder(requiredSchema);
 final AvroToRowDataConverters.AvroToRowDataConverter 
avroToRowDataConverter =
-
AvroToRowDataConverters.createRowConverter(tableState.getRequiredRowType());
+
AvroToRowDataConverters.createRowConverter(tableState.getRequiredRowType(), 
conf.getBoolean(FlinkOptions.READ_UTC_TIMEZONE));
 final HoodieMergedLogRecordScanner scanner = FormatUtils.logScanner(split, 
tableSchema, internalSchemaManager.getQuerySchema(), conf, hadoopConf);
 final Iterator logRecordsKeyIterator = 
scanner.getRecords().keySet().iterator();
 final int[] pkOffset = tableState.getPkOffsetsInRequired();
@@ -431,7 +431,7 @@ public class MergeOnReadInputFormat
 final Schema requiredSchema = new 
Schema.Parser().parse(tableState.getRequiredAvroSchema());
 final GenericRecordBuilder recordBuilder = new 
GenericRecordBuilder(requiredSchema);
 final AvroToRowDataConverters.AvroToRowDataConverter 
avroToRowDataConverter =
-
AvroToRowDataConverters.createRowConverter(tableState.getRequiredRowType());
+
AvroToRowDataConverters.createRowConverter(tableState.getRequiredRowType(), 
conf.getBoolean(FlinkOptions.READ_UTC_TIMEZONE));
 final FormatUtils.BoundedMemoryRecords records = new 
FormatUtils.BoundedMemoryRecords(split, tableSchema, 
internalSchemaManager.getQuerySchema(), hadoopConf, conf);
 final Iterator> recordsIterator = 
records.getRecordsIterator();
 
@@ -478,7 +478,7 @@ public class MergeOnReadInputFormat
   protected ClosableIterator 
getFullLogFileIterator(MergeOnReadInputSplit split) {
 final Schema tableSchema = new 
Schema.Parser().parse(tableState.getAvroSchema());
 final AvroToRowDataConverters.AvroToRowDataConverter 
avroToRowDataConverter =
-AvroToRowDataConverters.createRowConverter(tableState.getRowType());
+AvroToRowDataConverters.createRowConverter(tableState.getRowType(), 
conf.getBoolean(FlinkOptions.READ_UTC_TIMEZONE));
 final HoodieMergedLogRecordScanner scanner = FormatUtils.logScanner(split, 
tableSchema, InternalSchema.getEmptyInternalSchema(), conf, hadoopConf);
 final Iterator logRecordsKeyIterator = 
scanner.getRecords().keySet().iterator();
 
@@ -736,7 +736,7 @@ public class MergeOnReadInputFormat
   this.operationPos = operationPos;
   this.avroProjection = avroProjection;
   this.rowDataToAvroConverter = 

Re: [PR] [HUDI-7647] READ_UTC_TIMEZONE doesn't affect log files for MOR tables [hudi]

2024-04-23 Thread via GitHub


danny0405 merged PR #11066:
URL: https://github.com/apache/hudi/pull/11066


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7647) READ_UTC_TIMEZONE doesn't affect log files for MOR tables

2024-04-23 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7647.

Resolution: Fixed

Fixed via master branch: ce0c2671a0a5e010173e0e6caf9c21ca2f175a30

> READ_UTC_TIMEZONE doesn't affect log files for MOR tables
> -
>
> Key: HUDI-7647
> URL: https://issues.apache.org/jira/browse/HUDI-7647
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Mark Bukhner
>Priority: Major
>  Labels: flink, pull-request-available
> Fix For: 1.0.0
>
>
> Write COPY_ON_WRITE table:
> {code:java}
> tableEnv.executeSql("CREATE TABLE test_2(\n"
> + "  uuid VARCHAR(40),\n"
> + "  name VARCHAR(10),\n"
> + "  age INT,\n"
> + "  ts TIMESTAMP(3),\n"
> + "  `partition` VARCHAR(20)\n"
> + ")\n"
> + "PARTITIONED BY (`partition`)\n"
> + "WITH (\n"
> + "  'connector' = 'hudi',\n"
> + "  'path' = '...',\n"
> + "  'table.type' = 'COPY_ON_WRITE',\n"
> + "  'write.utc-timezone' = 'true',\n"
> + "  'index.type' = 'INMEMORY'\n"
> + ");").await(); 
> tableEnv.executeSql("insert into test_2 \n" 
> + "values ('ab', 'cccx', 12, TIMESTAMP '1972-01-01 00:00:01', 'xx'),\n"
> + " ('ab', 'cccx', 12, TIMESTAMP '1970-01-01 00:00:01', 
> 'xx');").await();{code}
> Then read COW table with READ_UTC_TIMEZONE will recieve:
> {code:java}
> +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'true' 
> +I[ab, cccx, 12, 1972-01-01T07:00:01, xx] // if READ_UTC_TIMEZONE = 'false' 
> {code}
> But if create and write table with 'table.type' = 'COPY_ON_WRITE' will 
> recieve:
> {code:java}
> +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'true'
> +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'false'
> {code}
> There is no difference between READ_UTC_TIMEZONE equals true or false while 
> read log files (MOR table), but 7h difference while read COW table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7647) READ_UTC_TIMEZONE doesn't affect log files for MOR tables

2024-04-23 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reassigned HUDI-7647:


Assignee: Danny Chen

> READ_UTC_TIMEZONE doesn't affect log files for MOR tables
> -
>
> Key: HUDI-7647
> URL: https://issues.apache.org/jira/browse/HUDI-7647
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Mark Bukhner
>Assignee: Danny Chen
>Priority: Major
>  Labels: flink, pull-request-available
> Fix For: 1.0.0
>
>
> Write COPY_ON_WRITE table:
> {code:java}
> tableEnv.executeSql("CREATE TABLE test_2(\n"
> + "  uuid VARCHAR(40),\n"
> + "  name VARCHAR(10),\n"
> + "  age INT,\n"
> + "  ts TIMESTAMP(3),\n"
> + "  `partition` VARCHAR(20)\n"
> + ")\n"
> + "PARTITIONED BY (`partition`)\n"
> + "WITH (\n"
> + "  'connector' = 'hudi',\n"
> + "  'path' = '...',\n"
> + "  'table.type' = 'COPY_ON_WRITE',\n"
> + "  'write.utc-timezone' = 'true',\n"
> + "  'index.type' = 'INMEMORY'\n"
> + ");").await(); 
> tableEnv.executeSql("insert into test_2 \n" 
> + "values ('ab', 'cccx', 12, TIMESTAMP '1972-01-01 00:00:01', 'xx'),\n"
> + " ('ab', 'cccx', 12, TIMESTAMP '1970-01-01 00:00:01', 
> 'xx');").await();{code}
> Then read COW table with READ_UTC_TIMEZONE will recieve:
> {code:java}
> +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'true' 
> +I[ab, cccx, 12, 1972-01-01T07:00:01, xx] // if READ_UTC_TIMEZONE = 'false' 
> {code}
> But if create and write table with 'table.type' = 'COPY_ON_WRITE' will 
> recieve:
> {code:java}
> +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'true'
> +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'false'
> {code}
> There is no difference between READ_UTC_TIMEZONE equals true or false while 
> read log files (MOR table), but 7h difference while read COW table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7652) Add new MergeKey API to support simple and composite keys

2024-04-23 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7652:
--
Status: In Progress  (was: Open)

> Add new MergeKey API to support simple and composite keys
> -
>
> Key: HUDI-7652
> URL: https://issues.apache.org/jira/browse/HUDI-7652
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> Based on RFC- https://github.com/apache/hudi/pull/10814#discussion_r1567362323



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7652) Add new MergeKey API to support simple and composite keys

2024-04-23 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7652:
--
Status: Patch Available  (was: In Progress)

> Add new MergeKey API to support simple and composite keys
> -
>
> Key: HUDI-7652
> URL: https://issues.apache.org/jira/browse/HUDI-7652
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: hudi-1.0.0-beta2, pull-request-available
> Fix For: 1.0.0
>
>
> Based on RFC- https://github.com/apache/hudi/pull/10814#discussion_r1567362323



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [I] A bug when RocksDBDAO executes the prefixDelete function to delete the last entry [hudi]

2024-04-23 Thread via GitHub


MicroGery commented on issue #11075:
URL: https://github.com/apache/hudi/issues/11075#issuecomment-2073932313

   > Is this a bug from real production use case or just a code reviewing?
   
   Just a code reviewing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   >