Re: [PR] [MINOR] Fixed naming of methods in HoodieMetadataConfig [hudi]
hudi-bot commented on PR #11076: URL: https://github.com/apache/hudi/pull/11076#issuecomment-2072523610 ## CI report: * f597f95d19d0f09176efcb358f3d1980efc7f946 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23425) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7653] Refactor HoodieFileIndex for more flexibility [hudi]
hudi-bot commented on PR #11074: URL: https://github.com/apache/hudi/pull/11074#issuecomment-2072523463 ## CI report: * c45d96645dd48aff96aa199693937c2d99c1ace0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23424) * e32f1f8615fbf4452673e79138dc23fe1309a45a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23426) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7653] Refactor HoodieFileIndex for more flexibility [hudi]
hudi-bot commented on PR #11074: URL: https://github.com/apache/hudi/pull/11074#issuecomment-2072493453 ## CI report: * c45d96645dd48aff96aa199693937c2d99c1ace0 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23424) * e32f1f8615fbf4452673e79138dc23fe1309a45a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7653] Refactor HoodieFileIndex for more flexibility (#11074)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new cb6eb6785fd [HUDI-7653] Refactor HoodieFileIndex for more flexibility (#11074) cb6eb6785fd is described below commit cb6eb6785fdeb88e66016a2b8c0c6e6fa184b309 Author: Vova Kolmakov AuthorDate: Tue Apr 23 23:09:08 2024 +0700 [HUDI-7653] Refactor HoodieFileIndex for more flexibility (#11074) Created new abstract class `SparkBaseIndexSupport` with abstract methods `getIndexName`, `isIndexAvailable`, `computeCandidateFileNames` and `invalidateCaches` (to override it in descendants) and concrete methods `getPrunedFileNames`, `getCandidateFiles` and `shouldReadInMemory` (moved from HoodieFileIndex or XXXIndexSupport to reuse it in descendants). - Co-authored-by: Sagar Sumit --- .../org/apache/hudi/ColumnStatsIndexSupport.scala | 68 ++- .../org/apache/hudi/FunctionalIndexSupport.scala | 121 +-- .../scala/org/apache/hudi/HoodieFileIndex.scala| 128 + .../org/apache/hudi/RecordLevelIndexSupport.scala | 48 +--- .../org/apache/hudi/SparkBaseIndexSupport.scala| 108 + 5 files changed, 243 insertions(+), 230 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala index dc15a3e8c8c..238962b964c 100644 --- a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala +++ b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala @@ -23,11 +23,10 @@ import org.apache.hudi.ColumnStatsIndexSupport._ import org.apache.hudi.HoodieCatalystUtils.{withPersistedData, withPersistedDataset} import org.apache.hudi.HoodieConversionUtils.toScalaOption import org.apache.hudi.avro.model._ -import org.apache.hudi.client.common.HoodieSparkEngineContext import org.apache.hudi.common.config.HoodieMetadataConfig import org.apache.hudi.common.data.HoodieData import org.apache.hudi.common.function.SerializableFunction -import org.apache.hudi.common.model.HoodieRecord +import org.apache.hudi.common.model.{FileSlice, HoodieRecord} import org.apache.hudi.common.table.HoodieTableMetaClient import org.apache.hudi.common.util.BinaryUtil.toBytes import org.apache.hudi.common.util.ValidationUtils.checkState @@ -36,7 +35,6 @@ import org.apache.hudi.common.util.hash.ColumnIndexID import org.apache.hudi.data.HoodieJavaRDD import org.apache.hudi.metadata.{HoodieMetadataPayload, HoodieTableMetadata, HoodieTableMetadataUtil, MetadataPartitionType} import org.apache.hudi.util.JFunction -import org.apache.spark.api.java.JavaSparkContext import org.apache.spark.sql.HoodieUnsafeUtils.{createDataFrameFromInternalRows, createDataFrameFromRDD, createDataFrameFromRows} import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.util.DateTimeUtils @@ -44,8 +42,10 @@ import org.apache.spark.sql.functions.col import org.apache.spark.sql.types._ import org.apache.spark.sql.{DataFrame, Row, SparkSession} import org.apache.spark.storage.StorageLevel - import java.nio.ByteBuffer + +import org.apache.spark.sql.catalyst.expressions.Expression + import scala.collection.JavaConverters._ import scala.collection.immutable.TreeSet import scala.collection.mutable.ListBuffer @@ -55,11 +55,8 @@ class ColumnStatsIndexSupport(spark: SparkSession, tableSchema: StructType, @transient metadataConfig: HoodieMetadataConfig, @transient metaClient: HoodieTableMetaClient, - allowCaching: Boolean = false) { - - @transient private lazy val engineCtx = new HoodieSparkEngineContext(new JavaSparkContext(spark.sparkContext)) - @transient private lazy val metadataTable: HoodieTableMetadata = -HoodieTableMetadata.create(engineCtx, metadataConfig, metaClient.getBasePathV2.toString) + allowCaching: Boolean = false) + extends SparkBaseIndexSupport(spark, metadataConfig, metaClient) { @transient private lazy val cachedColumnStatsIndexViews: ParHashMap[Seq[String], DataFrame] = ParHashMap() @@ -79,6 +76,40 @@ class ColumnStatsIndexSupport(spark: SparkSession, } } + override def getIndexName: String = ColumnStatsIndexSupport.INDEX_NAME + + override def computeCandidateFileNames(fileIndex: HoodieFileIndex, + queryFilters: Seq[Expression], + queryReferencedColumns: Seq[String], + prunedPartitionsAndFileSlices:
[jira] [Updated] (HUDI-7653) Refactor HoodieFileIndex for more flexibility
[ https://issues.apache.org/jira/browse/HUDI-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7653: -- Labels: hudi-1.0.0-beta2 pull-request-available (was: pull-request-available) > Refactor HoodieFileIndex for more flexibility > - > > Key: HUDI-7653 > URL: https://issues.apache.org/jira/browse/HUDI-7653 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Vova Kolmakov >Assignee: Vova Kolmakov >Priority: Major > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > Create hierarchy of IndexSupport that is usable without if-else branches, is > easy to extend with new types of indices and it works with Spark <3.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7653) Refactor HoodieFileIndex for more flexibility
[ https://issues.apache.org/jira/browse/HUDI-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7653: -- Status: Patch Available (was: In Progress) > Refactor HoodieFileIndex for more flexibility > - > > Key: HUDI-7653 > URL: https://issues.apache.org/jira/browse/HUDI-7653 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Vova Kolmakov >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > > Create hierarchy of IndexSupport that is usable without if-else branches, is > easy to extend with new types of indices and it works with Spark <3.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7653) Refactor HoodieFileIndex for more flexibility
[ https://issues.apache.org/jira/browse/HUDI-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7653: -- Fix Version/s: 1.0.0 > Refactor HoodieFileIndex for more flexibility > - > > Key: HUDI-7653 > URL: https://issues.apache.org/jira/browse/HUDI-7653 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Vova Kolmakov >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Create hierarchy of IndexSupport that is usable without if-else branches, is > easy to extend with new types of indices and it works with Spark <3.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7653) Refactor HoodieFileIndex for more flexibility
[ https://issues.apache.org/jira/browse/HUDI-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-7653. - Resolution: Done > Refactor HoodieFileIndex for more flexibility > - > > Key: HUDI-7653 > URL: https://issues.apache.org/jira/browse/HUDI-7653 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Vova Kolmakov >Assignee: Vova Kolmakov >Priority: Major > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > Create hierarchy of IndexSupport that is usable without if-else branches, is > easy to extend with new types of indices and it works with Spark <3.1 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7653] Refactor HoodieFileIndex for more flexibility [hudi]
codope merged PR #11074: URL: https://github.com/apache/hudi/pull/11074 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]
codope opened a new pull request, #11077: URL: https://github.com/apache/hudi/pull/11077 ### Change Logs This PR introduces a new class hierarchy for handling merge keys in a more flexible and decoupled manner. It adds the `HoodieMergeKey` interface, along with two implementations: `HoodieSimpleMergeKey` and `HoodieCompositeMergeKey`. This design allows us to extend key-based merge strategies easily. **Motivation** The need for introducing a new merge key handling mechanism was driven by the requirement to support different types of keys (simple and complex) without overloading the existing HoodieKey class, which is central to the write path. By segregating merge key handling into its own hierarchy, we avoid potential conflicts and keep modifications localised, improving the maintainability of the code. **Changes** 1. `HoodieMergeKey`: New API to ensure consistent handling including simple keys and composite keys. It includes methods for retrieving the key and partition path. 2. `HoodieSimpleMergeKey`: Wraps `HoodieKey` and implements the `HoodieMergeKey` interface for simple scenarios where the key is a string. 3. `HoodieCompositeMergeKey`: Implements the `HoodieMergeKey` interface but allows for complex types as keys, enhancing flexibility for scenarios where a simple string key is not sufficient. 4. `HoodieMergeKeyBasedRecordMerger`: A new implementation of `HoodieRecordMerger` based on `HoodieMergeKey`. If the merge keys are of type `HoodieCompositeMergeKey`, then it returns the older and newer records. Otherwise, it calls the merge method from the parent class. 5. `HoodieMergedLogRecordScanner`: Changes to merge based on `HoodieMergeKey`. 6. Unit tests for the new merger. These changes do not affect existing functionalities that do not rely on merge keys. It introduces additional classes that are used explicitly for new functionalities involving various key types in merging operations. This ensures minimal to no risk for existing processes. ### Impact Enhancing the flexibility and robustness of our key-based merge strategies. It helps in keeping our codebase scalable and maintainable, allowing easy extensions and modifications in the future. ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7652) Add new MergeKey API to support simple and composite keys
[ https://issues.apache.org/jira/browse/HUDI-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7652: - Labels: hudi-1.0.0-beta2 pull-request-available (was: hudi-1.0.0-beta2) > Add new MergeKey API to support simple and composite keys > - > > Key: HUDI-7652 > URL: https://issues.apache.org/jira/browse/HUDI-7652 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Major > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > Based on RFC- https://github.com/apache/hudi/pull/10814#discussion_r1567362323 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]
hudi-bot commented on PR #11077: URL: https://github.com/apache/hudi/pull/11077#issuecomment-2072997601 ## CI report: * 19a23e39e15d2818d28956959dc00f09bc51 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7146] [RFC-77] RFC for secondary index [hudi]
codope commented on code in PR #10814: URL: https://github.com/apache/hudi/pull/10814#discussion_r1576637313 ## rfc/rfc-77/rfc-77.md: ## @@ -0,0 +1,323 @@ + + +# RFC-77: Secondary Indexes + +## Proposers + +- @bhat-vinay +- @codope + +## Approvers + - @vinothchandar + - @nsivabalan + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7146 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +In this RFC, we propose implementing Secondary Indexes (SI), a new capability in Hudi's metadata table (MDT) based indexing +system. SI are indexes defined on user specified columns of the table. Similar to record level indexes, +SI will improve query performance when the query predicate contains secondary keys. The number of files +that a query needs to scan can be pruned down using secondary indexes. + +## Background + +Hudi supports different indexes through its MDT. These indexes help to improve query performance by +pruning down the set of files that need to be scanned to build the result set (of the query). + +One of the supported index in Hudi is the Record Level Index (RLI). RLI acts as a unique-key index and can be used to +locate a FileGroup of a record based on its RecordKey. A query having an EQUAL or IN predicate on the RecordKey will +have a performance boost as the RLI can accurately give a subset of FileGroups that contain the rows matching the +predicate. + +Many workloads have queries with predicates that are not based on RecordKey. Such queries cannot use RLI for data +skipping. Traditional databases have a notion of building indexes (called Secondary Index or SI) on user specified +columns to aid such queries. This RFC proposes implementing SI in Hudi. Users can build SI on columns which are +frequently used as filtering columns (i.e columns on which query predicate is based on). As with any other index, +building and maintaining SI adds overhead on the write path. Users should choose wisely based +on their workload. Tools can be built to provide guidance on the usefulness of indexing a specific column, but it is +not in the scope of this RFC. + +## Design and Implementation +This section discusses briefly the goals, design, implementation details of supporting SI in Hudi. At a high level, +the design principle and goals are as follows: +1. User specifies SI to be built on a given column of a table. A given SI can be built on only one column of the table +(i.e composite keys are not allowed). Any number of SI can be built on a Hudi table. The indexes to be built are +specified using regular SQL statements. +2. Metadata of a SI will be tracked through the index metadata file under `/.hoodie/.index` (this path can be configurable). +3. Each SI will be a partition inside Hudi MDT. Index data will not be materialized with the base table's data files. +4. Logical plan of a query will be used to efficiently filter FileGroups based on the query predicate and the available +indexes. + +### SQL +SI can be created using the regular `CREATE INDEX` SQL statement. +``` +-- PROPOSED SYNTAX WITH `secondary_index` as the index type -- +CREATE INDEX [IF NOT EXISTS] index_name ON [TABLE] table_name [USING secondary_index](index_column) +-- Examples -- +CREATE INDEX idx_city on hudi_table USING secondary_index(city) +CREATE INDEX idx_last_name on hudi_table (last_name) + +-- NO CHANGE IN DROP INDEX -- +DROP INDEX idx_city; +``` + +`index_name` - Required and validated by parser. `index_name` will be used to derive the name of the physical partition +in MDT by prefixing `secondary_index_`. If the `index_name` is `idx_city`, then the MDT partition will be +`secondary_index_idx_city` + +The index_type will be `secondary_index`. This will be used to distinguish SI from other Functional Indexes. + +### Secondary Index Metadata +Secondary index metadata will be managed the same way as Functional Index metadata. Since SI will not have any function +to be applied on each row, the `function_name` will be NULL. + +### Index in Metadata Table (MDT) +Each SI will be stored as a physical partition in the MDT. The partition name is derived from the `index_name` by +prefixing `secondary_index_`. Each entry in the SI partition will be a mapping of the form +`secondary_key -> record_key`. `secondary_key` will form the "record key" for the record of the SI partition. Note that +an important design consideration here is that users may choose to build SI on a non-unique column of the table. + + Index Initialisation +Initial build of the secondary index will scan all file slices (of the base table) to extract +`secondary-key -> record-key` tuple and write it into the secondary index partition in the metadata table. +This is similar to how RLI is initialised. + + Index Maintenance +The index needs to be updated on inserts, updates and deletes to the base table. Considering that secondary-keys in +the base table could be
Re: [PR] [HUDI-7146] [RFC-77] RFC for secondary index [hudi]
codope commented on code in PR #10814: URL: https://github.com/apache/hudi/pull/10814#discussion_r1576637885 ## rfc/rfc-77/rfc-77.md: ## @@ -0,0 +1,323 @@ + + +# RFC-77: Secondary Indexes + +## Proposers + +- @bhat-vinay +- @codope + +## Approvers + - @vinothchandar + - @nsivabalan + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7146 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +In this RFC, we propose implementing Secondary Indexes (SI), a new capability in Hudi's metadata table (MDT) based indexing +system. SI are indexes defined on user specified columns of the table. Similar to record level indexes, +SI will improve query performance when the query predicate contains secondary keys. The number of files +that a query needs to scan can be pruned down using secondary indexes. + +## Background + +Hudi supports different indexes through its MDT. These indexes help to improve query performance by +pruning down the set of files that need to be scanned to build the result set (of the query). + +One of the supported index in Hudi is the Record Level Index (RLI). RLI acts as a unique-key index and can be used to +locate a FileGroup of a record based on its RecordKey. A query having an EQUAL or IN predicate on the RecordKey will +have a performance boost as the RLI can accurately give a subset of FileGroups that contain the rows matching the +predicate. + +Many workloads have queries with predicates that are not based on RecordKey. Such queries cannot use RLI for data +skipping. Traditional databases have a notion of building indexes (called Secondary Index or SI) on user specified +columns to aid such queries. This RFC proposes implementing SI in Hudi. Users can build SI on columns which are +frequently used as filtering columns (i.e columns on which query predicate is based on). As with any other index, +building and maintaining SI adds overhead on the write path. Users should choose wisely based +on their workload. Tools can be built to provide guidance on the usefulness of indexing a specific column, but it is +not in the scope of this RFC. + +## Design and Implementation +This section discusses briefly the goals, design, implementation details of supporting SI in Hudi. At a high level, +the design principle and goals are as follows: +1. User specifies SI to be built on a given column of a table. A given SI can be built on only one column of the table +(i.e composite keys are not allowed). Any number of SI can be built on a Hudi table. The indexes to be built are +specified using regular SQL statements. +2. Metadata of a SI will be tracked through the index metadata file under `/.hoodie/.index` (this path can be configurable). +3. Each SI will be a partition inside Hudi MDT. Index data will not be materialized with the base table's data files. +4. Logical plan of a query will be used to efficiently filter FileGroups based on the query predicate and the available +indexes. + +### SQL +SI can be created using the regular `CREATE INDEX` SQL statement. +``` +-- PROPOSED SYNTAX WITH `secondary_index` as the index type -- +CREATE INDEX [IF NOT EXISTS] index_name ON [TABLE] table_name [USING secondary_index](index_column) +-- Examples -- +CREATE INDEX idx_city on hudi_table USING secondary_index(city) +CREATE INDEX idx_last_name on hudi_table (last_name) + +-- NO CHANGE IN DROP INDEX -- +DROP INDEX idx_city; +``` + +`index_name` - Required and validated by parser. `index_name` will be used to derive the name of the physical partition +in MDT by prefixing `secondary_index_`. If the `index_name` is `idx_city`, then the MDT partition will be +`secondary_index_idx_city` + +The index_type will be `secondary_index`. This will be used to distinguish SI from other Functional Indexes. + +### Secondary Index Metadata +Secondary index metadata will be managed the same way as Functional Index metadata. Since SI will not have any function +to be applied on each row, the `function_name` will be NULL. + +### Index in Metadata Table (MDT) +Each SI will be stored as a physical partition in the MDT. The partition name is derived from the `index_name` by +prefixing `secondary_index_`. Each entry in the SI partition will be a mapping of the form +`secondary_key -> record_key`. `secondary_key` will form the "record key" for the record of the SI partition. Note that +an important design consideration here is that users may choose to build SI on a non-unique column of the table. + + Index Initialisation +Initial build of the secondary index will scan all file slices (of the base table) to extract +`secondary-key -> record-key` tuple and write it into the secondary index partition in the metadata table. +This is similar to how RLI is initialised. + + Index Maintenance +The index needs to be updated on inserts, updates and deletes to the base table. Considering that secondary-keys in +the base table could be
Re: [PR] [HUDI-7235] Fix checkpoint bug for S3/GCS Incremental Source [hudi]
hudi-bot commented on PR #10336: URL: https://github.com/apache/hudi/pull/10336#issuecomment-2071510989 ## CI report: * de49a9da9db751d6fd6e0eaa1a750f8726a55018 UNKNOWN * 1b754dffcc5dc2f82c62de06ed9d037ac201d194 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23411) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Fix incorrect catch of ClassCastException using HoodieSparkKeyGeneratorFactory [hudi]
hudi-bot commented on PR #11062: URL: https://github.com/apache/hudi/pull/11062#issuecomment-2071512201 ## CI report: * f97bf7a9acdc086a5ada79c743b983c11947c3af UNKNOWN * a4faddba433d5e454cd409b2818cad6da4c46c32 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23412) * 2fe41f70ab0d295fe9b4a3b3e94387385a21e7d4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23417) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6386] Enable testArchivalWithMultiWriters back as they are passing [hudi]
hudi-bot commented on PR #9085: URL: https://github.com/apache/hudi/pull/9085#issuecomment-2071614427 ## CI report: * c818a7209bc320f3248f6ef5ea28fbf7358ccb8e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23415) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Fix incorrect catch of ClassCastException using HoodieSparkKeyGeneratorFactory [hudi]
hudi-bot commented on PR #11062: URL: https://github.com/apache/hudi/pull/11062#issuecomment-2071494457 ## CI report: * f97bf7a9acdc086a5ada79c743b983c11947c3af UNKNOWN * a4faddba433d5e454cd409b2818cad6da4c46c32 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23412) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7639) Refactor HoodieFileIndex so that different indexes can be used via optimizer rules
[ https://issues.apache.org/jira/browse/HUDI-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7639: - Labels: pull-request-available (was: ) > Refactor HoodieFileIndex so that different indexes can be used via optimizer > rules > -- > > Key: HUDI-7639 > URL: https://issues.apache.org/jira/browse/HUDI-7639 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Vova Kolmakov >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Currently, `HoodieFileIndex` is responsible for partition pruning as well as > file skipping. All indexes are being used in > [lookupCandidateFilesInMetadataTable|https://github.com/apache/hudi/blob/b5b14f7d4fa6224a6674b021664b510c6ae8afb9/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala#L333] > method through if-else branches. This is not only hard to maintain as we add > more indexes, but also induces a static hierarchy. Instead, we need more > flexibility so that we can alter logical plan based on availability of > indexes. For partition pruning in Spark, we already have > [HoodiePruneFileSourcePartitions|https://github.com/apache/hudi/blob/b5b14f7d4fa6224a6674b021664b510c6ae8afb9/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodiePruneFileSourcePartitions.scala#L40] > rule but it is injected during the operator optimization batch and it does > not modify the result of the LogicalPlan. To be fully extensible, we should > be able to rewrite the LogicalPlan. We should be able to inject rules after > partition pruning after the operator optimization batch and before any CBO > rules that depend on stats. Spark provides > [injectPreCBORules|https://github.com/apache/spark/blob/6232085227ee2cc4e831996a1ac84c27868a1595/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala#L304] > API to do so, however it is only available in Spark 3.1.0 onwards. > The goal of this ticket is to refactor index hierarchy and create new rules > such that Spark version < 3.1.0 still go via the old path, while later > versions can modify the plan using an appropriate index and inject as a > pre-CBO rule. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7639] Refactor HoodieFileIndex so that different indexes can be used via optimizer rules [hudi]
wombatu-kun opened a new pull request, #11074: URL: https://github.com/apache/hudi/pull/11074 ### Change Logs Task: https://issues.apache.org/jira/browse/HUDI-7639 Created new abstract class SparkBaseIndexSupport with abstract methods `getIndexName`, `isIndexAvailable`, `computeCandidateFileNames` and `invalidateCaches` (to override it in descendants) and concrete methods `getPrunedFileNames`, `getCandidateFiles` and `shouldReadInMemory` (moved from HoodieFileIndex or XXXIndexSupport to reuse it in descendants). Made `ColumnStatsIndexSupport`, `FunctionalIndexSupport` and `RecordLevelIndexSupport` classes extend `SparkBaseIndexSupport`. Implementation of `computeCandidateFileNames` was made from corresponding if-else branches of `HoodieFileIndex.lookupCandidateFilesInMetadataTable()`. Implementations of `getIndexName`, `isIndexAvailable` are trivial. Real implementation of `invalidateCaches` exists only for `ColumnStatsIndexSupport`. Replaced 3 individual XXXIndexSupport fields with one list of 3 SparkBaseIndexSupport items. The order of items is important: to preserve original behavior the order of indices must be: RecordLevel, Functional, ColStats. `HoodieFileIndex.lookupCandidateFilesInMetadataTable()` is simplified to just looping through the list, checking each Index availability and (if so) computing pruned file names by XXXIndexSupport class. ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update none - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Fix incorrect catch of ClassCastException using HoodieSparkKeyGeneratorFactory [hudi]
hudi-bot commented on PR #11062: URL: https://github.com/apache/hudi/pull/11062#issuecomment-2071503255 ## CI report: * f97bf7a9acdc086a5ada79c743b983c11947c3af UNKNOWN * a4faddba433d5e454cd409b2818cad6da4c46c32 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23412) * 2fe41f70ab0d295fe9b4a3b3e94387385a21e7d4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7639) Refactor HoodieFileIndex so that different indexes can be used via optimizer rules
[ https://issues.apache.org/jira/browse/HUDI-7639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov updated HUDI-7639: Status: In Progress (was: Open) > Refactor HoodieFileIndex so that different indexes can be used via optimizer > rules > -- > > Key: HUDI-7639 > URL: https://issues.apache.org/jira/browse/HUDI-7639 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Vova Kolmakov >Priority: Major > Fix For: 1.0.0 > > > Currently, `HoodieFileIndex` is responsible for partition pruning as well as > file skipping. All indexes are being used in > [lookupCandidateFilesInMetadataTable|https://github.com/apache/hudi/blob/b5b14f7d4fa6224a6674b021664b510c6ae8afb9/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala#L333] > method through if-else branches. This is not only hard to maintain as we add > more indexes, but also induces a static hierarchy. Instead, we need more > flexibility so that we can alter logical plan based on availability of > indexes. For partition pruning in Spark, we already have > [HoodiePruneFileSourcePartitions|https://github.com/apache/hudi/blob/b5b14f7d4fa6224a6674b021664b510c6ae8afb9/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodiePruneFileSourcePartitions.scala#L40] > rule but it is injected during the operator optimization batch and it does > not modify the result of the LogicalPlan. To be fully extensible, we should > be able to rewrite the LogicalPlan. We should be able to inject rules after > partition pruning after the operator optimization batch and before any CBO > rules that depend on stats. Spark provides > [injectPreCBORules|https://github.com/apache/spark/blob/6232085227ee2cc4e831996a1ac84c27868a1595/sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala#L304] > API to do so, however it is only available in Spark 3.1.0 onwards. > The goal of this ticket is to refactor index hierarchy and create new rules > such that Spark version < 3.1.0 still go via the old path, while later > versions can modify the plan using an appropriate index and inject as a > pre-CBO rule. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [MINOR] Fix incorrect catch of ClassCastException using HoodieSparkKeyGeneratorFactory [hudi]
hudi-bot commented on PR #11062: URL: https://github.com/apache/hudi/pull/11062#issuecomment-2071617519 ## CI report: * f97bf7a9acdc086a5ada79c743b983c11947c3af UNKNOWN * a4faddba433d5e454cd409b2818cad6da4c46c32 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23412) * 2fe41f70ab0d295fe9b4a3b3e94387385a21e7d4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23417) * d99ea433d02161130f8ee6d0028319e009253a12 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7656] Disable a flaky test [hudi]
hudi-bot commented on PR #11078: URL: https://github.com/apache/hudi/pull/11078#issuecomment-2073406841 ## CI report: * ff7ab8d5c15cd2311b0cf0bd6eaa2f5061fc8db3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23430) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7657] Disable a flaky test in deltastreamer [hudi]
hudi-bot commented on PR #11079: URL: https://github.com/apache/hudi/pull/11079#issuecomment-2073406892 ## CI report: * bd68c36702ebde586b9f57bf1d36c3751b91e61a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]
hudi-bot commented on PR #11077: URL: https://github.com/apache/hudi/pull/11077#issuecomment-2073101141 ## CI report: * 19a23e39e15d2818d28956959dc00f09bc51 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23428) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Streamer test setup performance [hudi]
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2073287758 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 11c19fa8fd39ed058a4e3487c99c793610b61564 UNKNOWN * d9f583043f1a5ffd532d613b2ce95aa7a8fddc47 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23213) * b6faa0ddf78a193ed8cdb1ce8eb14ae49016a105 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23429) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]
hudi-bot commented on PR #11077: URL: https://github.com/apache/hudi/pull/11077#issuecomment-2073013071 ## CI report: * 19a23e39e15d2818d28956959dc00f09bc51 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23428) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7655) Support configuration for clean to fail execution if there is at least one file is marked as a failed delete
Krishen Bhan created HUDI-7655: -- Summary: Support configuration for clean to fail execution if there is at least one file is marked as a failed delete Key: HUDI-7655 URL: https://issues.apache.org/jira/browse/HUDI-7655 Project: Apache Hudi Issue Type: Improvement Reporter: Krishen Bhan When a HUDI clean plan is executed, any targeted file that was not confirmed as deleted (or non-existing) will be marked as a "failed delete". Although these failed deletes will be added to `.clean` metadata, if incremental clean is used then these files might not ever be picked up again as a future clean plan, unless a "full-scan" clean ends up being scheduled. In addition to leading to more files unnecessarily taking up storage space for longer, then can lead to the following dataset consistency issue for COW datasets: # Insert at C1 creates file group f1 in partition # Replacecommit at RC2 creates file group f2 in partition, and replaces f1 # Any reader of partition that calls HUDI API (with or without using MDT) will recognize that f1 should be ignored, as it has been replaced. This is since RC2 instant file is in active timeline # Some completed instants later an incremental clean is scheduled. It moves the "earliest commit to retain" to an time after instant time RC2, so it targets f1 for deletion. But during execution of the plan, it fails to delete f1. # An archive job eventually is triggered, and archives C1. Note that f1 is still in partition At this point, any job/query that reads the aforementioned partition directly from the DFS file system calls (without directly using MDT FILES partition) will consider both f1 and f2 as valid file groups, since RC2 is no longer in active timeline. This is a data consistency issue, and will only be resolved if a "full-scan" clean is triggered and deletes f1. This specific scenario can be avoided if the user can configure HUDI clean to fail execution of a clean plan unless all files are confirmed as deleted (or not existing in DFS already), "blocking" the clean. The next clean attempt will re-execute this existing plan, since clean plans cannot be "rolled back". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7656) Disable TestCOWDataSource.testCopyOnWriteConcurrentUpdates
Lin Liu created HUDI-7656: - Summary: Disable TestCOWDataSource.testCopyOnWriteConcurrentUpdates Key: HUDI-7656 URL: https://issues.apache.org/jira/browse/HUDI-7656 Project: Apache Hudi Issue Type: Improvement Reporter: Lin Liu Assignee: Lin Liu This test is flaky. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7657] Disable a flaky test in deltastreamer [hudi]
linliu-code opened a new pull request, #11079: URL: https://github.com/apache/hudi/pull/11079 ### Change Logs Disable test: TestHoodieDeltaStreamer.testAutoGenerateRecordKeys ### Impact Less test coverage temporarily. ### Risk level (write none, low medium or high below) Low. ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7657) disable flaky: TestHoodieDeltaStreamer.testAutoGenerateRecordKeys
[ https://issues.apache.org/jira/browse/HUDI-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7657: - Labels: pull-request-available (was: ) > disable flaky: TestHoodieDeltaStreamer.testAutoGenerateRecordKeys > - > > Key: HUDI-7657 > URL: https://issues.apache.org/jira/browse/HUDI-7657 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [MINOR] Streamer test setup performance [hudi]
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2073275371 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 11c19fa8fd39ed058a4e3487c99c793610b61564 UNKNOWN * d9f583043f1a5ffd532d613b2ce95aa7a8fddc47 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23213) * b6faa0ddf78a193ed8cdb1ce8eb14ae49016a105 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7656] Disable a flaky test [hudi]
linliu-code opened a new pull request, #11078: URL: https://github.com/apache/hudi/pull/11078 ### Change Logs Disable test: TestCOWDataSource.testCopyOnWriteConcurrentUpdates ### Impact Less coverage temporarily. ### Risk level (write none, low medium or high below) Low. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7656) Disable TestCOWDataSource.testCopyOnWriteConcurrentUpdates
[ https://issues.apache.org/jira/browse/HUDI-7656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7656: - Labels: pull-request-available (was: ) > Disable TestCOWDataSource.testCopyOnWriteConcurrentUpdates > -- > > Key: HUDI-7656 > URL: https://issues.apache.org/jira/browse/HUDI-7656 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Lin Liu >Assignee: Lin Liu >Priority: Major > Labels: pull-request-available > > This test is flaky. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7657) disable flaky: TestHoodieDeltaStreamer.testAutoGenerateRecordKeys
Lin Liu created HUDI-7657: - Summary: disable flaky: TestHoodieDeltaStreamer.testAutoGenerateRecordKeys Key: HUDI-7657 URL: https://issues.apache.org/jira/browse/HUDI-7657 Project: Apache Hudi Issue Type: Improvement Reporter: Lin Liu Assignee: Lin Liu -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7656] Disable a flaky test [hudi]
hudi-bot commented on PR #11078: URL: https://github.com/apache/hudi/pull/11078#issuecomment-2073394659 ## CI report: * ff7ab8d5c15cd2311b0cf0bd6eaa2f5061fc8db3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7658) Log time taken when meta sync fails in stream sync
[ https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7658: - Labels: pull-request-available (was: ) > Log time taken when meta sync fails in stream sync > -- > > Key: HUDI-7658 > URL: https://issues.apache.org/jira/browse/HUDI-7658 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > > Time is only printed in log statements on success, but it is useful to see > the log on failure as well -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7658] add time to meta sync failure log [hudi]
jonvex opened a new pull request, #11080: URL: https://github.com/apache/hudi/pull/11080 ### Change Logs log the time taken when meta sync fails ### Impact more consistent logging between success an failure ### Risk level (write none, low medium or high below) none ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7659) Update 0.14.0 release docs to call out that row writer w/ clustering is enabled by default
sivabalan narayanan created HUDI-7659: - Summary: Update 0.14.0 release docs to call out that row writer w/ clustering is enabled by default Key: HUDI-7659 URL: https://issues.apache.org/jira/browse/HUDI-7659 Project: Apache Hudi Issue Type: Improvement Components: docs Reporter: sivabalan narayanan Update 0.14.0 release docs to call out that row writer w/ clustering is enabled by default -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7651] Add util methods for creating meta client [hudi]
yihua opened a new pull request, #11081: URL: https://github.com/apache/hudi/pull/11081 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7651) Add util methods for creating meta client
[ https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7651: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Add util methods for creating meta client > - > > Key: HUDI-7651 > URL: https://issues.apache.org/jira/browse/HUDI-7651 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]
hudi-bot commented on PR #11081: URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073709014 ## CI report: * 3e1310ac3eceed725bc829bceb8a9dcbc81e4512 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23433) * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7596] Enable Jacoco code coverage report across multiple modules [hudi]
hudi-bot commented on PR #11073: URL: https://github.com/apache/hudi/pull/11073#issuecomment-2073708946 ## CI report: * 39c44a33eaae3bc17270cec93536ce727daacd98 UNKNOWN * c59ca7c5f11aad7435129f97904d8a2a6d958b03 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23423) * acdbe5f086b556febb77425596685670229451e7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7648] Refactor MetadataPartitionType so as to enahance reuse [hudi]
danny0405 commented on code in PR #11067: URL: https://github.com/apache/hudi/pull/11067#discussion_r1577074792 ## hudi-common/src/main/java/org/apache/hudi/metadata/MetadataPartitionType.java: ## @@ -70,6 +92,19 @@ public static Set getAllPartitionPaths() { .collect(Collectors.toSet()); } + /** + * Returns the list of metadata partition types enabled based on the metadata config and table config. + */ + public static List getEnabledPartitions(HoodieMetadataConfig metadataConfig, HoodieTableMetaClient metaClient) { +List enabledTypes = new ArrayList<>(4); Review Comment: Not sure whether we need a specific initial list length param. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7648] Refactor MetadataPartitionType so as to enahance reuse [hudi]
danny0405 commented on code in PR #11067: URL: https://github.com/apache/hudi/pull/11067#discussion_r1577074559 ## hudi-common/src/main/java/org/apache/hudi/metadata/MetadataPartitionType.java: ## @@ -18,30 +18,52 @@ package org.apache.hudi.metadata; +import org.apache.hudi.common.config.HoodieMetadataConfig; +import org.apache.hudi.common.table.HoodieTableMetaClient; + +import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.List; import java.util.Set; +import java.util.function.BiPredicate; +import java.util.function.Predicate; import java.util.stream.Collectors; /** * Partition types for metadata table. */ public enum MetadataPartitionType { - FILES(HoodieTableMetadataUtil.PARTITION_NAME_FILES, "files-"), - COLUMN_STATS(HoodieTableMetadataUtil.PARTITION_NAME_COLUMN_STATS, "col-stats-"), - BLOOM_FILTERS(HoodieTableMetadataUtil.PARTITION_NAME_BLOOM_FILTERS, "bloom-filters-"), - RECORD_INDEX(HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX, "record-index-"), - FUNCTIONAL_INDEX(HoodieTableMetadataUtil.PARTITION_NAME_FUNCTIONAL_INDEX_PREFIX, "func-index-"); + FILES(HoodieTableMetadataUtil.PARTITION_NAME_FILES, "files-", + HoodieMetadataConfig::enabled, + (metaClient, partitionType) -> metaClient.getTableConfig().isMetadataPartitionAvailable(partitionType)), + COLUMN_STATS(HoodieTableMetadataUtil.PARTITION_NAME_COLUMN_STATS, "col-stats-", + HoodieMetadataConfig::isColumnStatsIndexEnabled, + (metaClient, partitionType) -> metaClient.getTableConfig().isMetadataPartitionAvailable(partitionType)), + BLOOM_FILTERS(HoodieTableMetadataUtil.PARTITION_NAME_BLOOM_FILTERS, "bloom-filters-", + HoodieMetadataConfig::isBloomFilterIndexEnabled, + (metaClient, partitionType) -> metaClient.getTableConfig().isMetadataPartitionAvailable(partitionType)), + RECORD_INDEX(HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX, "record-index-", + HoodieMetadataConfig::isRecordIndexEnabled, + (metaClient, partitionType) -> metaClient.getTableConfig().isMetadataPartitionAvailable(partitionType)), + FUNCTIONAL_INDEX(HoodieTableMetadataUtil.PARTITION_NAME_FUNCTIONAL_INDEX_PREFIX, "func-index-", + metadataConfig -> false, // no config for functional index, it is created using sql + (metaClient, partitionType) -> metaClient.getFunctionalIndexMetadata().isPresent()); // Partition path in metadata table. private final String partitionPath; // FileId prefix used for all file groups in this partition. private final String fileIdPrefix; + private final Predicate isMetadataPartitionEnabled; Review Comment: Can we add some comments to these two variables? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7596] Enable Jacoco code coverage report across multiple modules [hudi]
hudi-bot commented on PR #11073: URL: https://github.com/apache/hudi/pull/11073#issuecomment-2073811600 ## CI report: * 39c44a33eaae3bc17270cec93536ce727daacd98 UNKNOWN * acdbe5f086b556febb77425596685670229451e7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23436) * dda40c2705709bfa6df2556c490f4f84b0c04b51 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]
hudi-bot commented on PR #11081: URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073811653 ## CI report: * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN * be718668e54ed3235ec45dd2147cd514048b1945 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23434) * 694488f2df3181678a49d136170e2fd9729b45b4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7648] Refactor MetadataPartitionType so as to enahance reuse [hudi]
jonvex commented on code in PR #11067: URL: https://github.com/apache/hudi/pull/11067#discussion_r1577049959 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -167,21 +167,15 @@ protected HoodieBackedTableMetadataWriter(Configuration hadoopConf, this.engineContext = engineContext; this.hadoopConf = new SerializableConfiguration(hadoopConf); this.metrics = Option.empty(); -this.enabledPartitionTypes = new ArrayList<>(4); Review Comment: oh, wow we just hardcoded that! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]
danny0405 commented on code in PR #11081: URL: https://github.com/apache/hudi/pull/11081#discussion_r1577097199 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -584,6 +585,39 @@ public static HoodieTableMetaClient initTableAndGetMetaClient(Configuration hado return metaClient; } + /** + * @param conf file system configuration. + * @param basePath base path of the Hudi table. + * @return a new {@link HoodieTableMetaClient} instance. + */ + public static HoodieTableMetaClient build(Configuration conf, +String basePath) { Review Comment: Don't think there is necessity to add three new builder methods, the original builder is more flexible to extend and there is no much gains to switch these new methods which also introduce bunden for maintainance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]
hudi-bot commented on PR #11081: URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073828849 ## CI report: * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN * be718668e54ed3235ec45dd2147cd514048b1945 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23434) * 694488f2df3181678a49d136170e2fd9729b45b4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23438) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7652) Add new MergeKey API to support simple and composite keys
[ https://issues.apache.org/jira/browse/HUDI-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7652: -- Reviewers: Danny Chen, Ethan Guo > Add new MergeKey API to support simple and composite keys > - > > Key: HUDI-7652 > URL: https://issues.apache.org/jira/browse/HUDI-7652 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Major > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > Based on RFC- https://github.com/apache/hudi/pull/10814#discussion_r1567362323 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7632] Remove FileSystem usage in HoodieLogFormatWriter [hudi]
wombatu-kun opened a new pull request, #11082: URL: https://github.com/apache/hudi/pull/11082 ### Change Logs Removed FileSystem usage in HoodieLogFormatWriter by adding methods to HoodieStorage API `getDefaultBufferSize()`, `getDefaultReplication()`, `create(StoragePath path, boolean overwrite, Integer bufferSize, Short replication, Long sizeThreshold)` (with appropriate implementations in HoodieHadoopStorage) and use them in HoodieLogFormatWriter instead of using fs. Also fixed logging. ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update none - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7632) Remove FileSystem usage in HoodieLogFormatWriter
[ https://issues.apache.org/jira/browse/HUDI-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7632: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Remove FileSystem usage in HoodieLogFormatWriter > > > Key: HUDI-7632 > URL: https://issues.apache.org/jira/browse/HUDI-7632 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 1.0.0 > > > https://github.com/apache/hudi/pull/10591#discussion_r1569173014 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]
hudi-bot commented on PR #11080: URL: https://github.com/apache/hudi/pull/11080#issuecomment-2073591022 ## CI report: * 0d9301d153f8878a582bfe973a0aaa60ae6b0af9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23432) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]
hudi-bot commented on PR #11081: URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073659591 ## CI report: * 3e1310ac3eceed725bc829bceb8a9dcbc81e4512 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]
hudi-bot commented on PR #11080: URL: https://github.com/apache/hudi/pull/11080#issuecomment-2073659533 ## CI report: * 0d9301d153f8878a582bfe973a0aaa60ae6b0af9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23432) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]
hudi-bot commented on PR #11081: URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073701612 ## CI report: * 3e1310ac3eceed725bc829bceb8a9dcbc81e4512 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23433) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7651) Add util methods for creating meta client
[ https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7651: Story Points: 4 (was: 1) > Add util methods for creating meta client > - > > Key: HUDI-7651 > URL: https://issues.apache.org/jira/browse/HUDI-7651 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7651) Add util methods for creating meta client
[ https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7651: Status: Patch Available (was: In Progress) > Add util methods for creating meta client > - > > Key: HUDI-7651 > URL: https://issues.apache.org/jira/browse/HUDI-7651 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7651) Add util methods for creating meta client
[ https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7651: Sprint: Sprint 2024-03-25 > Add util methods for creating meta client > - > > Key: HUDI-7651 > URL: https://issues.apache.org/jira/browse/HUDI-7651 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7588) Replace hadoop Configuration with StorageConfiguration in hudi-common module
[ https://issues.apache.org/jira/browse/HUDI-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7588: Status: Patch Available (was: In Progress) > Replace hadoop Configuration with StorageConfiguration in hudi-common module > > > Key: HUDI-7588 > URL: https://issues.apache.org/jira/browse/HUDI-7588 > Project: Apache Hudi > Issue Type: Task >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7651) Add util methods for creating meta client
[ https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7651: Epic Link: HUDI-6243 > Add util methods for creating meta client > - > > Key: HUDI-7651 > URL: https://issues.apache.org/jira/browse/HUDI-7651 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7648] Refactor MetadataPartitionType so as to enahance reuse [hudi]
danny0405 commented on code in PR #11067: URL: https://github.com/apache/hudi/pull/11067#discussion_r1577076390 ## hudi-common/src/test/java/org/apache/hudi/metadata/TestMetadataPartitionType.java: ## @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.metadata; + +import org.apache.hudi.common.config.HoodieMetadataConfig; +import org.apache.hudi.common.model.HoodieFunctionalIndexMetadata; +import org.apache.hudi.common.table.HoodieTableConfig; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.util.Option; + +import org.junit.jupiter.api.Test; +import org.mockito.Mockito; + +import java.util.List; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertTrue; + +/** + * Tests for {@link MetadataPartitionType}. + */ +public class TestMetadataPartitionType { + + @Test + public void testPartitionEnabledByConfigOnly() { +HoodieTableMetaClient metaClient = Mockito.mock(HoodieTableMetaClient.class); +HoodieTableConfig tableConfig = Mockito.mock(HoodieTableConfig.class); + +// Simulate the configuration enabling FILES but the meta client not having it available (yet to initialize files partition) +Mockito.when(metaClient.getTableConfig()).thenReturn(tableConfig); + Mockito.when(tableConfig.isMetadataPartitionAvailable(MetadataPartitionType.FILES)).thenReturn(false); + Mockito.when(metaClient.getFunctionalIndexMetadata()).thenReturn(Option.empty()); +HoodieMetadataConfig metadataConfig = HoodieMetadataConfig.newBuilder().enable(true).build(); + +List enabledPartitions = MetadataPartitionType.getEnabledPartitions(metadataConfig, metaClient); + +// Verify FILES is enabled due to config +assertEquals(1, enabledPartitions.size(), "Only one partition should be enabled"); +assertTrue(enabledPartitions.contains(MetadataPartitionType.FILES), "FILES should be enabled by config"); + } + + @Test + public void testPartitionAvailableByMetaClientOnly() { +HoodieTableMetaClient metaClient = Mockito.mock(HoodieTableMetaClient.class); +HoodieTableConfig tableConfig = Mockito.mock(HoodieTableConfig.class); + +// Simulate the meta client having RECORD_INDEX available but config not enabling it +Mockito.when(metaClient.getTableConfig()).thenReturn(tableConfig); + Mockito.when(tableConfig.isMetadataPartitionAvailable(MetadataPartitionType.FILES)).thenReturn(true); Review Comment: So the meta config speified by write config does not override the config from table config metadata, is that the case? Then how can a user disable this index type once they have enabled it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [MINOR] Fixe naming of methods in HoodieMetadataConfig (#11076)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new c17f50dbbfc [MINOR] Fixe naming of methods in HoodieMetadataConfig (#11076) c17f50dbbfc is described below commit c17f50dbbfcf4b32ca0790837c672e4fd2d54e85 Author: Vova Kolmakov AuthorDate: Wed Apr 24 08:05:39 2024 +0700 [MINOR] Fixe naming of methods in HoodieMetadataConfig (#11076) --- .../java/org/apache/hudi/config/HoodieWriteConfig.java | 2 +- .../hudi/table/action/index/RunIndexActionExecutor.java| 2 +- .../apache/hudi/testutils/HoodieJavaClientTestHarness.java | 2 +- .../hudi/testutils/HoodieSparkClientTestHarness.java | 2 +- .../apache/hudi/common/config/HoodieMetadataConfig.java| 14 +- .../java/org/apache/hudi/metadata/BaseTableMetadata.java | 4 ++-- .../apache/hudi/metadata/HoodieBackedTableMetadata.java| 2 +- .../java/org/apache/hudi/metadata/HoodieTableMetadata.java | 2 +- .../org/apache/hudi/metadata/HoodieTableMetadataUtil.java | 2 +- .../src/main/java/org/apache/hudi/source/FileIndex.java| 2 +- .../scala/org/apache/hudi/ColumnStatsIndexSupport.scala| 2 +- .../scala/org/apache/hudi/FunctionalIndexSupport.scala | 2 +- .../src/main/scala/org/apache/hudi/HoodieFileIndex.scala | 2 +- .../scala/org/apache/hudi/RecordLevelIndexSupport.scala| 2 +- 14 files changed, 19 insertions(+), 23 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java index 8c53b06d879..755074997cb 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java @@ -2508,7 +2508,7 @@ public class HoodieWriteConfig extends HoodieConfig { } public boolean isRecordIndexEnabled() { -return metadataConfig.enableRecordIndex(); +return metadataConfig.isRecordIndexEnabled(); } public int getRecordIndexMinFileGroupCount() { diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java index 09a9b153db1..1da3c0c4be2 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java @@ -99,7 +99,7 @@ public class RunIndexActionExecutor extends BaseActionExecutor table, String instantTime) { super(context, config, table, instantTime); this.txnManager = new TransactionManager(config, table.getMetaClient().getStorage()); -if (config.getMetadataConfig().enableMetrics()) { +if (config.getMetadataConfig().isMetricsEnabled()) { this.metrics = Option.of(new HoodieMetadataMetrics(config.getMetricsConfig())); } else { this.metrics = Option.empty(); diff --git a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/testutils/HoodieJavaClientTestHarness.java b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/testutils/HoodieJavaClientTestHarness.java index 96ac7444eca..74cc19ea875 100644 --- a/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/testutils/HoodieJavaClientTestHarness.java +++ b/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/testutils/HoodieJavaClientTestHarness.java @@ -251,7 +251,7 @@ public abstract class HoodieJavaClientTestHarness extends HoodieWriterClientTest } public void syncTableMetadata(HoodieWriteConfig writeConfig) { -if (!writeConfig.getMetadataConfig().enabled()) { +if (!writeConfig.getMetadataConfig().isEnabled()) { return; } // Open up the metadata table again, for syncing diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieSparkClientTestHarness.java b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieSparkClientTestHarness.java index 2c97e960779..284c08f7309 100644 --- a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieSparkClientTestHarness.java +++ b/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/HoodieSparkClientTestHarness.java @@ -531,7 +531,7 @@ public abstract class HoodieSparkClientTestHarness extends HoodieWriterClientTes } public void syncTableMetadata(HoodieWriteConfig writeConfig) { -if (!writeConfig.getMetadataConfig().enabled()) { +if (!writeConfig.getMetadataConfig().isEnabled()) { return; } // Open up the metadata table again, for syncing diff --git
Re: [PR] [MINOR] Fixed naming of methods in HoodieMetadataConfig [hudi]
danny0405 merged PR #11076: URL: https://github.com/apache/hudi/pull/11076 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys [hudi]
danny0405 commented on code in PR #11077: URL: https://github.com/apache/hudi/pull/11077#discussion_r1577086360 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMergeKey.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.common.model; + +import java.io.Serializable; + +/** + * Defines a standard for all merge keys to ensure consistent handling including simple keys and composite keys. + * It includes methods for retrieving the key and partition path. + */ +public interface HoodieMergeKey extends Serializable { Review Comment: Not sure why the record key needs to be bound to the partition path, because under global index, a key is only located under one partition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]
hudi-bot commented on PR #11081: URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073803021 ## CI report: * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN * be718668e54ed3235ec45dd2147cd514048b1945 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23434) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]
hudi-bot commented on PR #11070: URL: https://github.com/apache/hudi/pull/11070#issuecomment-2073802830 ## CI report: * c250cc04340a016a04878a7647d4b27a608e7374 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23403) * 1ce1316840852fa8e21363100f6ce695a5ecf0a7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23435) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7596] Enable Jacoco code coverage report across multiple modules [hudi]
hudi-bot commented on PR #11073: URL: https://github.com/apache/hudi/pull/11073#issuecomment-2073802939 ## CI report: * 39c44a33eaae3bc17270cec93536ce727daacd98 UNKNOWN * acdbe5f086b556febb77425596685670229451e7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23436) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7651) Add util methods for creating meta client
[ https://issues.apache.org/jira/browse/HUDI-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7651: -- Reviewers: Sagar Sumit > Add util methods for creating meta client > - > > Key: HUDI-7651 > URL: https://issues.apache.org/jira/browse/HUDI-7651 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Ethan Guo >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]
hudi-bot commented on PR #11080: URL: https://github.com/apache/hudi/pull/11080#issuecomment-2073885067 ## CI report: * 0d9301d153f8878a582bfe973a0aaa60ae6b0af9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23432) * f5e6a9914ed766ac6650513a04d12c3d3cea4407 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7656] Disable a flaky test [hudi]
linliu-code closed pull request #11078: [HUDI-7656] Disable a flaky test URL: https://github.com/apache/hudi/pull/11078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7657] Disable a flaky test in deltastreamer [hudi]
linliu-code closed pull request #11079: [HUDI-7657] Disable a flaky test in deltastreamer URL: https://github.com/apache/hudi/pull/11079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]
hudi-bot commented on PR #11080: URL: https://github.com/apache/hudi/pull/11080#issuecomment-2073583735 ## CI report: * 0d9301d153f8878a582bfe973a0aaa60ae6b0af9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]
yihua commented on code in PR #11080: URL: https://github.com/apache/hudi/pull/11080#discussion_r1576980964 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java: ## @@ -1026,27 +1026,32 @@ public void runMetaSync() { Map failedMetaSyncs = new HashMap<>(); for (String impl : syncClientToolClasses) { Timer.Context syncContext = metrics.getMetaSyncTimerContext(); -boolean success = false; +HoodieMetaSyncException metaSyncException = null; Review Comment: nit: use `Option` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]
hudi-bot commented on PR #11081: URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073722586 ## CI report: * 3e1310ac3eceed725bc829bceb8a9dcbc81e4512 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23433) * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN * be718668e54ed3235ec45dd2147cd514048b1945 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-7596) Enable Jacoco code coverage report across multiple modules
[ https://issues.apache.org/jira/browse/HUDI-7596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838411#comment-17838411 ] Danny Chen edited comment on HUDI-7596 at 4/24/24 12:38 AM: The link jacoco official maven plugin doc: [https://www.jacoco.org/jacoco/trunk/doc/] jacoco multi module: [https://www.baeldung.com/maven-jacoco-multi-module-project] jacoco and Azure CI: [https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/publish-code-coverage-results-v1?view=azure-pipelines] jacoco and Azure YouTube: [https://www.youtube.com/watch?v=nflwvk2cJ2o] report generator: https://marketplace.visualstudio.com/items?itemName=Palmmedia.reportgenerator PublishTestResults@2 - Publish Test Results v2 task: [https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/publish-test-results-v2?view=azure-pipelines=trx%2Ctrxattachments%2Cyaml] was (Author: danny0405): The link jacoco official maven plugin doc: [https://www.jacoco.org/jacoco/trunk/doc/] jacoco multi module: [https://www.baeldung.com/maven-jacoco-multi-module-project] jacoco and Azure CI: [https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/publish-code-coverage-results-v1?view=azure-pipelines] jacoco and Azure YouTube: [https://www.youtube.com/watch?v=nflwvk2cJ2o] PublishTestResults@2 - Publish Test Results v2 task: [https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/reference/publish-test-results-v2?view=azure-pipelines=trx%2Ctrxattachments%2Cyaml] > Enable Jacoco code coverage report across multiple modules > -- > > Key: HUDI-7596 > URL: https://issues.apache.org/jira/browse/HUDI-7596 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available, starter > Fix For: 0.15.0, 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]
hudi-bot commented on PR #11070: URL: https://github.com/apache/hudi/pull/11070#issuecomment-2073722347 ## CI report: * c250cc04340a016a04878a7647d4b27a608e7374 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23403) * 1ce1316840852fa8e21363100f6ce695a5ecf0a7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7648] Refactor MetadataPartitionType so as to enahance reuse [hudi]
danny0405 commented on code in PR #11067: URL: https://github.com/apache/hudi/pull/11067#discussion_r1577078330 ## hudi-common/src/test/java/org/apache/hudi/metadata/TestMetadataPartitionType.java: ## @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.metadata; + +import org.apache.hudi.common.config.HoodieMetadataConfig; +import org.apache.hudi.common.model.HoodieFunctionalIndexMetadata; +import org.apache.hudi.common.table.HoodieTableConfig; +import org.apache.hudi.common.table.HoodieTableMetaClient; +import org.apache.hudi.common.util.Option; + +import org.junit.jupiter.api.Test; +import org.mockito.Mockito; + +import java.util.List; + +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertTrue; + +/** + * Tests for {@link MetadataPartitionType}. + */ +public class TestMetadataPartitionType { + + @Test + public void testPartitionEnabledByConfigOnly() { +HoodieTableMetaClient metaClient = Mockito.mock(HoodieTableMetaClient.class); +HoodieTableConfig tableConfig = Mockito.mock(HoodieTableConfig.class); + +// Simulate the configuration enabling FILES but the meta client not having it available (yet to initialize files partition) +Mockito.when(metaClient.getTableConfig()).thenReturn(tableConfig); + Mockito.when(tableConfig.isMetadataPartitionAvailable(MetadataPartitionType.FILES)).thenReturn(false); + Mockito.when(metaClient.getFunctionalIndexMetadata()).thenReturn(Option.empty()); +HoodieMetadataConfig metadataConfig = HoodieMetadataConfig.newBuilder().enable(true).build(); + +List enabledPartitions = MetadataPartitionType.getEnabledPartitions(metadataConfig, metaClient); + +// Verify FILES is enabled due to config +assertEquals(1, enabledPartitions.size(), "Only one partition should be enabled"); +assertTrue(enabledPartitions.contains(MetadataPartitionType.FILES), "FILES should be enabled by config"); + } + + @Test + public void testPartitionAvailableByMetaClientOnly() { +HoodieTableMetaClient metaClient = Mockito.mock(HoodieTableMetaClient.class); +HoodieTableConfig tableConfig = Mockito.mock(HoodieTableConfig.class); + +// Simulate the meta client having RECORD_INDEX available but config not enabling it +Mockito.when(metaClient.getTableConfig()).thenReturn(tableConfig); + Mockito.when(tableConfig.isMetadataPartitionAvailable(MetadataPartitionType.FILES)).thenReturn(true); Review Comment: Looks like once the user enable the metadata table with initial index type set up, they can never change it again unless the disable the whole metadata table functionality. That might need to be improved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] A bug when RocksDBDAO executes the prefixDelete function to delete the last entry [hudi]
danny0405 commented on issue #11075: URL: https://github.com/apache/hudi/issues/11075#issuecomment-2073782944 Is this a bug from real production use case or just a code reviewing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7596] Enable Jacoco code coverage report across multiple modules [hudi]
danny0405 commented on PR #11073: URL: https://github.com/apache/hudi/pull/11073#issuecomment-2073797993 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]
danny0405 commented on code in PR #11081: URL: https://github.com/apache/hudi/pull/11081#discussion_r1577095513 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/index/RunIndexActionExecutor.java: ## @@ -157,7 +157,8 @@ public Option execute() { // reconcile with metadata table timeline String metadataBasePath = getMetadataTableBasePath(table.getMetaClient().getBasePathV2().toString()); - HoodieTableMetaClient metadataMetaClient = HoodieTableMetaClient.builder().setConf(hadoopConf).setBasePath(metadataBasePath).build(); + HoodieTableMetaClient metadataMetaClient = + HoodieTableMetaClient.build(hadoopConf, metadataBasePath); Review Comment: Is this change necessary? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-4732] Add support for confluent schema registry with proto [hudi]
hudi-bot commented on PR #11070: URL: https://github.com/apache/hudi/pull/11070#issuecomment-2073879258 ## CI report: * 1ce1316840852fa8e21363100f6ce695a5ecf0a7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23435) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7596] Enable Jacoco code coverage report across multiple modules [hudi]
hudi-bot commented on PR #11073: URL: https://github.com/apache/hudi/pull/11073#issuecomment-2073879293 ## CI report: * 39c44a33eaae3bc17270cec93536ce727daacd98 UNKNOWN * acdbe5f086b556febb77425596685670229451e7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23436) * dda40c2705709bfa6df2556c490f4f84b0c04b51 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23439) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7651] Add util methods for creating meta client [hudi]
hudi-bot commented on PR #11081: URL: https://github.com/apache/hudi/pull/11081#issuecomment-2073879336 ## CI report: * 3e6fcaaa1aaac9cf83bf410772a2690afc913bce UNKNOWN * 694488f2df3181678a49d136170e2fd9729b45b4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23438) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7658] add time to meta sync failure log [hudi]
hudi-bot commented on PR #11080: URL: https://github.com/apache/hudi/pull/11080#issuecomment-2073891560 ## CI report: * 0d9301d153f8878a582bfe973a0aaa60ae6b0af9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23432) * f5e6a9914ed766ac6650513a04d12c3d3cea4407 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23440) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Streamer test setup performance [hudi]
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2073481386 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 11c19fa8fd39ed058a4e3487c99c793610b61564 UNKNOWN * b6faa0ddf78a193ed8cdb1ce8eb14ae49016a105 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23429) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7657] Disable a flaky test in deltastreamer [hudi]
hudi-bot commented on PR #11079: URL: https://github.com/apache/hudi/pull/11079#issuecomment-2073482219 ## CI report: * bd68c36702ebde586b9f57bf1d36c3751b91e61a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23431) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7658) Log time taken when meta sync fails in stream sync
[ https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7658: -- Status: Patch Available (was: In Progress) > Log time taken when meta sync fails in stream sync > -- > > Key: HUDI-7658 > URL: https://issues.apache.org/jira/browse/HUDI-7658 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > > Time is only printed in log statements on success, but it is useful to see > the log on failure as well -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7658) Log time taken when meta sync fails in stream sync
Jonathan Vexler created HUDI-7658: - Summary: Log time taken when meta sync fails in stream sync Key: HUDI-7658 URL: https://issues.apache.org/jira/browse/HUDI-7658 Project: Apache Hudi Issue Type: Improvement Components: deltastreamer Reporter: Jonathan Vexler Time is only printed in log statements on success, but it is useful to see the log on failure as well -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7658) Log time taken when meta sync fails in stream sync
[ https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7658: -- Status: In Progress (was: Open) > Log time taken when meta sync fails in stream sync > -- > > Key: HUDI-7658 > URL: https://issues.apache.org/jira/browse/HUDI-7658 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > > Time is only printed in log statements on success, but it is useful to see > the log on failure as well -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7658) Log time taken when meta sync fails in stream sync
[ https://issues.apache.org/jira/browse/HUDI-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler reassigned HUDI-7658: - Assignee: Jonathan Vexler > Log time taken when meta sync fails in stream sync > -- > > Key: HUDI-7658 > URL: https://issues.apache.org/jira/browse/HUDI-7658 > Project: Apache Hudi > Issue Type: Improvement > Components: deltastreamer >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > > Time is only printed in log statements on success, but it is useful to see > the log on failure as well -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [HUDI-7647] READ_UTC_TIMEZONE doesn't affect log files for MOR tables (#11066)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new ce0c2671a0a [HUDI-7647] READ_UTC_TIMEZONE doesn't affect log files for MOR tables (#11066) ce0c2671a0a is described below commit ce0c2671a0a5e010173e0e6caf9c21ca2f175a30 Author: Марк Бухнер <66881554+alowa...@users.noreply.github.com> AuthorDate: Wed Apr 24 08:06:25 2024 +0700 [HUDI-7647] READ_UTC_TIMEZONE doesn't affect log files for MOR tables (#11066) --- .../hudi/source/stats/ColumnStatsIndices.java | 2 +- .../table/format/mor/MergeOnReadInputFormat.java | 8 ++--- .../apache/hudi/util/AvroToRowDataConverters.java | 42 +- .../apache/hudi/table/ITTestHoodieDataSource.java | 31 4 files changed, 46 insertions(+), 37 deletions(-) diff --git a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/ColumnStatsIndices.java b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/ColumnStatsIndices.java index 05931876603..7032f299368 100644 --- a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/ColumnStatsIndices.java +++ b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/stats/ColumnStatsIndices.java @@ -272,7 +272,7 @@ public class ColumnStatsIndices { LogicalType logicalType, Map converters) { AvroToRowDataConverters.AvroToRowDataConverter converter = -converters.computeIfAbsent(logicalType, k -> AvroToRowDataConverters.createConverter(logicalType)); +converters.computeIfAbsent(logicalType, k -> AvroToRowDataConverters.createConverter(logicalType, true)); return converter.convert(rawVal); } diff --git a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java index 29bb0a06d8c..3690fc911d8 100644 --- a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java +++ b/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/mor/MergeOnReadInputFormat.java @@ -351,7 +351,7 @@ public class MergeOnReadInputFormat final Schema requiredSchema = new Schema.Parser().parse(tableState.getRequiredAvroSchema()); final GenericRecordBuilder recordBuilder = new GenericRecordBuilder(requiredSchema); final AvroToRowDataConverters.AvroToRowDataConverter avroToRowDataConverter = - AvroToRowDataConverters.createRowConverter(tableState.getRequiredRowType()); + AvroToRowDataConverters.createRowConverter(tableState.getRequiredRowType(), conf.getBoolean(FlinkOptions.READ_UTC_TIMEZONE)); final HoodieMergedLogRecordScanner scanner = FormatUtils.logScanner(split, tableSchema, internalSchemaManager.getQuerySchema(), conf, hadoopConf); final Iterator logRecordsKeyIterator = scanner.getRecords().keySet().iterator(); final int[] pkOffset = tableState.getPkOffsetsInRequired(); @@ -431,7 +431,7 @@ public class MergeOnReadInputFormat final Schema requiredSchema = new Schema.Parser().parse(tableState.getRequiredAvroSchema()); final GenericRecordBuilder recordBuilder = new GenericRecordBuilder(requiredSchema); final AvroToRowDataConverters.AvroToRowDataConverter avroToRowDataConverter = - AvroToRowDataConverters.createRowConverter(tableState.getRequiredRowType()); + AvroToRowDataConverters.createRowConverter(tableState.getRequiredRowType(), conf.getBoolean(FlinkOptions.READ_UTC_TIMEZONE)); final FormatUtils.BoundedMemoryRecords records = new FormatUtils.BoundedMemoryRecords(split, tableSchema, internalSchemaManager.getQuerySchema(), hadoopConf, conf); final Iterator> recordsIterator = records.getRecordsIterator(); @@ -478,7 +478,7 @@ public class MergeOnReadInputFormat protected ClosableIterator getFullLogFileIterator(MergeOnReadInputSplit split) { final Schema tableSchema = new Schema.Parser().parse(tableState.getAvroSchema()); final AvroToRowDataConverters.AvroToRowDataConverter avroToRowDataConverter = -AvroToRowDataConverters.createRowConverter(tableState.getRowType()); +AvroToRowDataConverters.createRowConverter(tableState.getRowType(), conf.getBoolean(FlinkOptions.READ_UTC_TIMEZONE)); final HoodieMergedLogRecordScanner scanner = FormatUtils.logScanner(split, tableSchema, InternalSchema.getEmptyInternalSchema(), conf, hadoopConf); final Iterator logRecordsKeyIterator = scanner.getRecords().keySet().iterator(); @@ -736,7 +736,7 @@ public class MergeOnReadInputFormat this.operationPos = operationPos; this.avroProjection = avroProjection; this.rowDataToAvroConverter =
Re: [PR] [HUDI-7647] READ_UTC_TIMEZONE doesn't affect log files for MOR tables [hudi]
danny0405 merged PR #11066: URL: https://github.com/apache/hudi/pull/11066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7647) READ_UTC_TIMEZONE doesn't affect log files for MOR tables
[ https://issues.apache.org/jira/browse/HUDI-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7647. Resolution: Fixed Fixed via master branch: ce0c2671a0a5e010173e0e6caf9c21ca2f175a30 > READ_UTC_TIMEZONE doesn't affect log files for MOR tables > - > > Key: HUDI-7647 > URL: https://issues.apache.org/jira/browse/HUDI-7647 > Project: Apache Hudi > Issue Type: Bug >Reporter: Mark Bukhner >Priority: Major > Labels: flink, pull-request-available > Fix For: 1.0.0 > > > Write COPY_ON_WRITE table: > {code:java} > tableEnv.executeSql("CREATE TABLE test_2(\n" > + " uuid VARCHAR(40),\n" > + " name VARCHAR(10),\n" > + " age INT,\n" > + " ts TIMESTAMP(3),\n" > + " `partition` VARCHAR(20)\n" > + ")\n" > + "PARTITIONED BY (`partition`)\n" > + "WITH (\n" > + " 'connector' = 'hudi',\n" > + " 'path' = '...',\n" > + " 'table.type' = 'COPY_ON_WRITE',\n" > + " 'write.utc-timezone' = 'true',\n" > + " 'index.type' = 'INMEMORY'\n" > + ");").await(); > tableEnv.executeSql("insert into test_2 \n" > + "values ('ab', 'cccx', 12, TIMESTAMP '1972-01-01 00:00:01', 'xx'),\n" > + " ('ab', 'cccx', 12, TIMESTAMP '1970-01-01 00:00:01', > 'xx');").await();{code} > Then read COW table with READ_UTC_TIMEZONE will recieve: > {code:java} > +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'true' > +I[ab, cccx, 12, 1972-01-01T07:00:01, xx] // if READ_UTC_TIMEZONE = 'false' > {code} > But if create and write table with 'table.type' = 'COPY_ON_WRITE' will > recieve: > {code:java} > +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'true' > +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'false' > {code} > There is no difference between READ_UTC_TIMEZONE equals true or false while > read log files (MOR table), but 7h difference while read COW table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7647) READ_UTC_TIMEZONE doesn't affect log files for MOR tables
[ https://issues.apache.org/jira/browse/HUDI-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen reassigned HUDI-7647: Assignee: Danny Chen > READ_UTC_TIMEZONE doesn't affect log files for MOR tables > - > > Key: HUDI-7647 > URL: https://issues.apache.org/jira/browse/HUDI-7647 > Project: Apache Hudi > Issue Type: Bug >Reporter: Mark Bukhner >Assignee: Danny Chen >Priority: Major > Labels: flink, pull-request-available > Fix For: 1.0.0 > > > Write COPY_ON_WRITE table: > {code:java} > tableEnv.executeSql("CREATE TABLE test_2(\n" > + " uuid VARCHAR(40),\n" > + " name VARCHAR(10),\n" > + " age INT,\n" > + " ts TIMESTAMP(3),\n" > + " `partition` VARCHAR(20)\n" > + ")\n" > + "PARTITIONED BY (`partition`)\n" > + "WITH (\n" > + " 'connector' = 'hudi',\n" > + " 'path' = '...',\n" > + " 'table.type' = 'COPY_ON_WRITE',\n" > + " 'write.utc-timezone' = 'true',\n" > + " 'index.type' = 'INMEMORY'\n" > + ");").await(); > tableEnv.executeSql("insert into test_2 \n" > + "values ('ab', 'cccx', 12, TIMESTAMP '1972-01-01 00:00:01', 'xx'),\n" > + " ('ab', 'cccx', 12, TIMESTAMP '1970-01-01 00:00:01', > 'xx');").await();{code} > Then read COW table with READ_UTC_TIMEZONE will recieve: > {code:java} > +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'true' > +I[ab, cccx, 12, 1972-01-01T07:00:01, xx] // if READ_UTC_TIMEZONE = 'false' > {code} > But if create and write table with 'table.type' = 'COPY_ON_WRITE' will > recieve: > {code:java} > +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'true' > +I[ab, cccx, 12, 1972-01-01T00:00:01, xx] // if READ_UTC_TIMEZONE = 'false' > {code} > There is no difference between READ_UTC_TIMEZONE equals true or false while > read log files (MOR table), but 7h difference while read COW table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7652) Add new MergeKey API to support simple and composite keys
[ https://issues.apache.org/jira/browse/HUDI-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7652: -- Status: In Progress (was: Open) > Add new MergeKey API to support simple and composite keys > - > > Key: HUDI-7652 > URL: https://issues.apache.org/jira/browse/HUDI-7652 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Major > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > Based on RFC- https://github.com/apache/hudi/pull/10814#discussion_r1567362323 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7652) Add new MergeKey API to support simple and composite keys
[ https://issues.apache.org/jira/browse/HUDI-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7652: -- Status: Patch Available (was: In Progress) > Add new MergeKey API to support simple and composite keys > - > > Key: HUDI-7652 > URL: https://issues.apache.org/jira/browse/HUDI-7652 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Major > Labels: hudi-1.0.0-beta2, pull-request-available > Fix For: 1.0.0 > > > Based on RFC- https://github.com/apache/hudi/pull/10814#discussion_r1567362323 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [I] A bug when RocksDBDAO executes the prefixDelete function to delete the last entry [hudi]
MicroGery commented on issue #11075: URL: https://github.com/apache/hudi/issues/11075#issuecomment-2073932313 > Is this a bug from real production use case or just a code reviewing? Just a code reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org