Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
KnightChess commented on code in PR #11448: URL: https://github.com/apache/hudi/pull/11448#discussion_r1639092577 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -256,6 +257,137 @@ public static RecordMergeMode getRecordMergeMode(Properties props) { return RecordMergeMode.valueOf(mergeMode); } + public static Builder builder() { +return new Builder<>(); + } + + public static class Builder { + +HoodieReaderContext readerContext; +HoodieStorage storage; +String tablePath; +String latestCommitTime; +FileSlice fileSlice; +Schema dataSchema; +Schema requestedSchema; +Option internalSchemaOpt; +HoodieTableMetaClient hoodieTableMetaClient; +TypedProperties props; +HoodieTableConfig tableConfig; +long start; +long length; +boolean shouldUseRecordPosition = false; +long maxMemorySizeInBytes; +String spillableMapBasePath; +ExternalSpillableMap.DiskMapType diskMapType; +boolean isBitCaskDiskMapCompressionEnabled; + +public Builder withReaderContext(HoodieReaderContext readerContext) { + this.readerContext = readerContext; + return this; +} + +public Builder withHoodieStorage(HoodieStorage storage) { + this.storage = storage; + return this; +} + +public Builder withTablePath(String tablePath) { + this.tablePath = tablePath; + return this; +} + +public Builder withLatestCommitTime(String latestCommitTime) { + this.latestCommitTime = latestCommitTime; + return this; +} + +public Builder withFileSlice(FileSlice fileSlice) { + this.fileSlice = fileSlice; + return this; +} + +public Builder withDataSchema(Schema dataSchema) { + this.dataSchema = dataSchema; + return this; +} + +public Builder withRequestedSchema(Schema requestedSchema) { + this.requestedSchema = requestedSchema; + return this; +} + +public Builder withInternalSchemaOpt(Option internalSchemaOpt) { + this.internalSchemaOpt = internalSchemaOpt; + return this; +} + +public Builder withMetaClient(HoodieTableMetaClient hoodieTableMetaClient) { + this.hoodieTableMetaClient = hoodieTableMetaClient; + return this; +} + +public Builder withTypedProperties(TypedProperties props) { + this.props = props; + return this; +} + +public Builder withTableConfig(HoodieTableConfig tableConfig) { + this.tableConfig = tableConfig; + return this; +} + +public Builder withStart(long start) { + this.start = start; + return this; +} + +public Builder withLength(long length) { + this.length = length; + return this; +} + +public Builder withUseRecordPosition(boolean shouldUseRecordPosition) { + this.shouldUseRecordPosition = shouldUseRecordPosition; + return this; +} + +public Builder withMaxMemorySizeInBytes(long maxMemorySizeInBytes) { + this.maxMemorySizeInBytes = maxMemorySizeInBytes; + return this; +} + +public Builder withSpillableMapBasePath(String spillableMapBasePath) { + this.spillableMapBasePath = spillableMapBasePath; + return this; +} + +public Builder withDiskMapType(ExternalSpillableMap.DiskMapType diskMapType) { + this.diskMapType = diskMapType; + return this; +} + +public Builder withBitCaskDiskMapCompressionEnabled(boolean isBitCaskDiskMapCompressionEnabled) { + this.isBitCaskDiskMapCompressionEnabled = isBitCaskDiskMapCompressionEnabled; + return this; +} + +public HoodieFileGroupReader build() { + ValidationUtils.checkArgument(readerContext != null); + ValidationUtils.checkArgument(fileSlice != null); + ValidationUtils.checkArgument(dataSchema != null); + ValidationUtils.checkArgument(requestedSchema != null); + if (internalSchemaOpt == null) { Review Comment: other like `shouldUseRecordPosition`, `maxMemorySizeInBytes`,`spillableMapBasePath`... may be can give default value -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]
KnightChess commented on code in PR #11455: URL: https://github.com/apache/hudi/pull/11455#discussion_r1639086482 ## hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestHoodiePositionBasedFileGroupRecordBuffer.java: ## @@ -125,17 +126,17 @@ public void prepareBuffer(RecordMergeMode mergeMode) throws Exception { Option.empty(), metaClient.getTableConfig())); TypedProperties props = new TypedProperties(); props.put(HoodieCommonConfig.RECORD_MERGE_MODE.key(), mergeMode.name()); + props.setProperty(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.key(),String.valueOf(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.defaultValue())); Review Comment: `1024 * 1024 * 1000 ` replace defaultValue ## hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java: ## @@ -276,6 +277,10 @@ private void validateOutputFromFileGroupReader(StorageConfiguration storageCo props.setProperty("hoodie.payload.ordering.field", "timestamp"); props.setProperty(RECORD_MERGER_STRATEGY.key(), RECORD_MERGER_STRATEGY.defaultValue()); props.setProperty(RECORD_MERGE_MODE.key(), recordMergeMode.name()); +props.setProperty(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.key(), String.valueOf(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.defaultValue())); Review Comment: `1024 * 1024 * 1000 ` replace defaultValue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]
hudi-bot commented on PR #11455: URL: https://github.com/apache/hudi/pull/11455#issuecomment-2167021177 ## CI report: * 2704f291b4e87e2776bc7dfd9539c5ba5f7d2749 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24393) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]
hudi-bot commented on PR #11452: URL: https://github.com/apache/hudi/pull/11452#issuecomment-2167021148 ## CI report: * 8c9f4bfd2eb32b3cb06b755bf9210607ea5e865a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24392) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
hudi-bot commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2167021098 ## CI report: * 01581df81e33432179d9dfe574f8e9ae74f18038 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24391) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]
KnightChess commented on code in PR #11445: URL: https://github.com/apache/hudi/pull/11445#discussion_r1639082667 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java: ## @@ -713,11 +714,24 @@ private void deleteInvalidFilesByPartitions(HoodieEngineContext context,
Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]
danny0405 commented on code in PR #11445: URL: https://github.com/apache/hudi/pull/11445#discussion_r1639080670 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java: ## @@ -713,11 +714,24 @@ private void deleteInvalidFilesByPartitions(HoodieEngineContext context,
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
wombatu-kun commented on code in PR #11385: URL: https://github.com/apache/hudi/pull/11385#discussion_r1639065444 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -581,23 +572,27 @@ public void validateTableProperties(Properties properties) { */ public static HoodieTableMetaClient initTableAndGetMetaClient(StorageConfiguration storageConf, String basePath, Properties props) throws IOException { +return initTableAndGetMetaClient(storageConf, new StoragePath(basePath), props); Review Comment: Ok, i'll create jira-task a bit later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
wombatu-kun commented on code in PR #11385: URL: https://github.com/apache/hudi/pull/11385#discussion_r1639065241 ## hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java: ## @@ -95,7 +95,7 @@ storage, new StoragePath(logFilePathPattern)).stream() new HashMap<>(); int numCorruptBlocks = 0; int dummyInstantTimeCount = 0; -String basePath = HoodieCLI.getTableMetaClient().getBasePathV2().toString(); +String basePath = HoodieCLI.basePath; Review Comment: No, I haven't tested it. But as I see from the code, you need to connect to a different table if you want to change the base path. And during this process (connecting) both basePath and metaClient are updated. Also `connect` method is invoked on execution of createTable command. So these changes should not break anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]
hudi-bot commented on PR #11452: URL: https://github.com/apache/hudi/pull/11452#issuecomment-2166930481 ## CI report: * 4c87d3db776324da5a1e6f3caacb2b55c525ff36 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24385) * 8c9f4bfd2eb32b3cb06b755bf9210607ea5e865a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24392) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]
hudi-bot commented on PR #11455: URL: https://github.com/apache/hudi/pull/11455#issuecomment-2166930511 ## CI report: * 468720a4bc3c521e71e2d2e6459d37fd27f8444b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24389) * 2704f291b4e87e2776bc7dfd9539c5ba5f7d2749 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24393) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
hudi-bot commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166930448 ## CI report: * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383) * 01581df81e33432179d9dfe574f8e9ae74f18038 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24391) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]
hudi-bot commented on PR #11455: URL: https://github.com/apache/hudi/pull/11455#issuecomment-2166923741 ## CI report: * 468720a4bc3c521e71e2d2e6459d37fd27f8444b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24389) * 2704f291b4e87e2776bc7dfd9539c5ba5f7d2749 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
hudi-bot commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166923627 ## CI report: * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383) * 01581df81e33432179d9dfe574f8e9ae74f18038 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]
hudi-bot commented on PR #11452: URL: https://github.com/apache/hudi/pull/11452#issuecomment-2166923682 ## CI report: * 4c87d3db776324da5a1e6f3caacb2b55c525ff36 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24385) * 8c9f4bfd2eb32b3cb06b755bf9210607ea5e865a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
VitoMakarevich commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166919410 Also I see [PR](https://github.com/apache/hudi/pull/7512) which introduced custom class of `hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java`. As I understand it came after the developers' decision to not use the schema with which the file has been written in favor of the deduced writer schema. So the purpose of the previous PR was that: If for some reason Parquet file is written in a new style(3-level nesting) - likely with some other than Spark tool or with "spark.hadoop.parquet.avro.write-old-list-structure", "false" - then if there is no overrides(safety measures first), kindly request reader to read it as a new style. Without it - it was likely leading to the same issue we are facing, so the current change is basically the reverse case - if the file was written as 2 level, no matter what setting is in the runtime, use 2 level readers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]
hudi-bot commented on PR #11455: URL: https://github.com/apache/hudi/pull/11455#issuecomment-2166917379 ## CI report: * 468720a4bc3c521e71e2d2e6459d37fd27f8444b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24389) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7875] remove tablepath from fg reader [hudi]
hudi-bot commented on PR #11453: URL: https://github.com/apache/hudi/pull/11453#issuecomment-2166917349 ## CI report: * b06455ddb5402cbb0d7df375b7595277f8b0eab9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24388) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
hudi-bot commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166917289 ## CI report: * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
VitoMakarevich commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166915644 In certain setups, this problem leads to silent data loss. https://github.com/VitoMakarevich/hudi-issue-014?tab=readme-ov-file#silent-dataloss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
VitoMakarevich commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166879255 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
jonvex merged PR #11449: URL: https://github.com/apache/hudi/pull/11449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] add builder to filegroup reader [hudi]
jonvex closed pull request #10630: [HUDI-7386] add builder to filegroup reader URL: https://github.com/apache/hudi/pull/10630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]
hudi-bot commented on PR #11452: URL: https://github.com/apache/hudi/pull/11452#issuecomment-2166852841 ## CI report: * 4c87d3db776324da5a1e6f3caacb2b55c525ff36 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24385) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
hudi-bot commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166852778 ## CI report: * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
hudi-bot commented on PR #11449: URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166852725 ## CI report: * 6b21881333bd3870957950fd467189689887ec5c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24381) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
nsivabalan commented on code in PR #11449: URL: https://github.com/apache/hudi/pull/11449#discussion_r1638777639 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -92,7 +92,6 @@ public HoodieFileGroupReader(HoodieReaderContext readerContext, Option internalSchemaOpt, HoodieTableMetaClient hoodieTableMetaClient, TypedProperties props, - HoodieTableConfig tableConfig, Review Comment: +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
nsivabalan commented on code in PR #11448: URL: https://github.com/apache/hudi/pull/11448#discussion_r1638753499 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -256,6 +257,137 @@ public static RecordMergeMode getRecordMergeMode(Properties props) { return RecordMergeMode.valueOf(mergeMode); } + public static Builder builder() { +return new Builder<>(); + } + + public static class Builder { + +HoodieReaderContext readerContext; +HoodieStorage storage; +String tablePath; +String latestCommitTime; +FileSlice fileSlice; +Schema dataSchema; +Schema requestedSchema; +Option internalSchemaOpt; +HoodieTableMetaClient hoodieTableMetaClient; +TypedProperties props; +HoodieTableConfig tableConfig; +long start; +long length; +boolean shouldUseRecordPosition = false; +long maxMemorySizeInBytes; +String spillableMapBasePath; +ExternalSpillableMap.DiskMapType diskMapType; +boolean isBitCaskDiskMapCompressionEnabled; Review Comment: wow. too many args. ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -256,6 +257,137 @@ public static RecordMergeMode getRecordMergeMode(Properties props) { return RecordMergeMode.valueOf(mergeMode); } + public static Builder builder() { +return new Builder<>(); + } + + public static class Builder { + +HoodieReaderContext readerContext; Review Comment: private ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]
codope commented on code in PR #11455: URL: https://github.com/apache/hudi/pull/11455#discussion_r1638968211 ## hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java: ## @@ -276,6 +277,10 @@ private void validateOutputFromFileGroupReader(StorageConfiguration storageCo props.setProperty("hoodie.payload.ordering.field", "timestamp"); props.setProperty(RECORD_MERGER_STRATEGY.key(), RECORD_MERGER_STRATEGY.defaultValue()); props.setProperty(RECORD_MERGE_MODE.key(), recordMergeMode.name()); + props.setProperty(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.key(),String.valueOf(1024 * 1024 * 1000)); Review Comment: store the default value in a constant. ## hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestHoodiePositionBasedFileGroupRecordBuffer.java: ## @@ -125,17 +126,17 @@ public void prepareBuffer(RecordMergeMode mergeMode) throws Exception { Option.empty(), metaClient.getTableConfig())); TypedProperties props = new TypedProperties(); props.put(HoodieCommonConfig.RECORD_MERGE_MODE.key(), mergeMode.name()); + props.setProperty(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.key(),String.valueOf(1024 * 1024 * 1000)); Review Comment: same here and at other places -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix Hudi being able to read 2-level structure if explicit flag for wr… [hudi]
hudi-bot commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166558699 ## CI report: * a13f012152b0cd41feccfd589ea438f3e1697607 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] AWS Glue: An error occurred while calling o333.save. Failed to apply clean commit to metadata [hudi]
howardcho closed issue #11454: [SUPPORT] AWS Glue: An error occurred while calling o333.save. Failed to apply clean commit to metadata URL: https://github.com/apache/hudi/issues/11454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] AWS Glue: An error occurred while calling o333.save. Failed to apply clean commit to metadata [hudi]
howardcho commented on issue #11454: URL: https://github.com/apache/hudi/issues/11454#issuecomment-2166804682 While reviewing the config, I saw this setting: `hoodie-conf hoodie.cleaner.parallelism`. This seemed incorrect, so I removed the `hoodie-conf` portion, and re-ran the jobs, and they seem to be working. Thanks and sorry for the alarm. Closing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7875] remove tablepath from fg reader [hudi]
hudi-bot commented on PR #11453: URL: https://github.com/apache/hudi/pull/11453#issuecomment-2166794958 ## CI report: * d82a4d6fb2c3cd0aa353c028d15659427eec4962 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24386) * b06455ddb5402cbb0d7df375b7595277f8b0eab9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24388) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]
hudi-bot commented on PR #11455: URL: https://github.com/apache/hudi/pull/11455#issuecomment-2166795004 ## CI report: * 468720a4bc3c521e71e2d2e6459d37fd27f8444b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24389) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
hudi-bot commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166794881 ## CI report: * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]
hudi-bot commented on PR #11374: URL: https://github.com/apache/hudi/pull/11374#issuecomment-2166794559 ## CI report: * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN * 531bfafd735b515fc756f3f9fc8a6b929f8e2c88 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24387) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix Hudi being able to read 2-level structure if explicit flag for wr… [hudi]
hudi-bot commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166575387 ## CI report: * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
VitoMakarevich commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166784596 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7875] remove tablepath from fg reader [hudi]
hudi-bot commented on PR #11453: URL: https://github.com/apache/hudi/pull/11453#issuecomment-2166782661 ## CI report: * d82a4d6fb2c3cd0aa353c028d15659427eec4962 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24386) * b06455ddb5402cbb0d7df375b7595277f8b0eab9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
hudi-bot commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166782435 ## CI report: * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]
hudi-bot commented on PR #11455: URL: https://github.com/apache/hudi/pull/11455#issuecomment-2166782750 ## CI report: * 468720a4bc3c521e71e2d2e6459d37fd27f8444b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
hudi-bot commented on PR #11449: URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166782357 ## CI report: * 6b21881333bd3870957950fd467189689887ec5c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24381) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]
hudi-bot commented on PR #11374: URL: https://github.com/apache/hudi/pull/11374#issuecomment-2166781884 ## CI report: * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN * 1e6955bbac8cc18f6774360c7b3ef4e307c1c397 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24205) * 531bfafd735b515fc756f3f9fc8a6b929f8e2c88 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
VitoMakarevich commented on code in PR #11450: URL: https://github.com/apache/hudi/pull/11450#discussion_r1638937534 ## hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java: ## @@ -51,6 +51,10 @@ public ReadContext init(Configuration configuration, Map keyValu configuration.set(AvroWriteSupport.WRITE_OLD_LIST_STRUCTURE, "false", "support reading avro from non-legacy map/list in parquet file"); } +if (legacyMode) { Review Comment: Yes, struggling to do this with all Hudi test rails, but slowly proceeding. I'm not sure exactly understand what you mean since if I have this code block it will anyway pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
VitoMakarevich commented on code in PR #11450: URL: https://github.com/apache/hudi/pull/11450#discussion_r1638937534 ## hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java: ## @@ -51,6 +51,10 @@ public ReadContext init(Configuration configuration, Map keyValu configuration.set(AvroWriteSupport.WRITE_OLD_LIST_STRUCTURE, "false", "support reading avro from non-legacy map/list in parquet file"); } +if (legacyMode) { Review Comment: Yes, struggling to do this with all Hudi test rails, but slowly proceeding. I'm not sure exactly understand what you mean since if I have this code block it will anyway proceed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]
yihua commented on code in PR #11385: URL: https://github.com/apache/hudi/pull/11385#discussion_r1638837492 ## hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java: ## @@ -95,7 +95,7 @@ storage, new StoragePath(logFilePathPattern)).stream() new HashMap<>(); int numCorruptBlocks = 0; int dummyInstantTimeCount = 0; -String basePath = HoodieCLI.getTableMetaClient().getBasePathV2().toString(); +String basePath = HoodieCLI.basePath; Review Comment: Have you tested the changes locally with Hudi CLI so that after changing the base path to a different table, the Hudi CLI works properly on the new table (both the base path and meta client are updated)? ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ## @@ -581,23 +572,27 @@ public void validateTableProperties(Properties properties) { */ public static HoodieTableMetaClient initTableAndGetMetaClient(StorageConfiguration storageConf, String basePath, Properties props) throws IOException { +return initTableAndGetMetaClient(storageConf, new StoragePath(basePath), props); Review Comment: A good follow-up would be removing any util methods taking `String` path and passing `StoragePath` instance all the way down. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
VitoMakarevich commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166774450 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
hudi-bot commented on PR #11450: URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166758114 ## CI report: * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
hudi-bot commented on PR #11449: URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166757927 ## CI report: * 6b21881333bd3870957950fd467189689887ec5c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
hudi-bot commented on PR #11448: URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166757804 ## CI report: * 7a032b2d72443ead03c3fb39af22552db360a2ba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24382) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
hudi-bot commented on PR #11448: URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166575297 ## CI report: * 9c2caef97e9e3a94097905179e57ab98e4fe8935 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24380) * 7a032b2d72443ead03c3fb39af22552db360a2ba Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24382) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
jonvex commented on code in PR #11448: URL: https://github.com/apache/hudi/pull/11448#discussion_r1638841328 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -256,6 +257,137 @@ public static RecordMergeMode getRecordMergeMode(Properties props) { return RecordMergeMode.valueOf(mergeMode); } + public static Builder builder() { +return new Builder<>(); + } + + public static class Builder { + +HoodieReaderContext readerContext; +HoodieStorage storage; +String tablePath; +String latestCommitTime; +FileSlice fileSlice; +Schema dataSchema; +Schema requestedSchema; +Option internalSchemaOpt; +HoodieTableMetaClient hoodieTableMetaClient; +TypedProperties props; +HoodieTableConfig tableConfig; +long start; +long length; +boolean shouldUseRecordPosition = false; +long maxMemorySizeInBytes; +String spillableMapBasePath; +ExternalSpillableMap.DiskMapType diskMapType; +boolean isBitCaskDiskMapCompressionEnabled; Review Comment: https://github.com/apache/hudi/pull/11449 https://github.com/apache/hudi/pull/11453 https://github.com/apache/hudi/pull/11455 followup prs to remove arguments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
hudi-bot commented on PR #11448: URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166465238 ## CI report: * 9c2caef97e9e3a94097905179e57ab98e4fe8935 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24380) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]
jonvex opened a new pull request, #11455: URL: https://github.com/apache/hudi/pull/11455 ### Change Logs Currently there are 4 params for the fg reader that are for spillable map configs. They can just be stored in the TypedProperties that is already passed in to the fg reader. ### Impact Easier to use the fg reader and integrate it in other parts of hudi ### Risk level (write none, low medium or high below) low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
yihua commented on code in PR #11450: URL: https://github.com/apache/hudi/pull/11450#discussion_r1638823816 ## hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java: ## @@ -51,6 +51,10 @@ public ReadContext init(Configuration configuration, Map keyValu configuration.set(AvroWriteSupport.WRITE_OLD_LIST_STRUCTURE, "false", "support reading avro from non-legacy map/list in parquet file"); } +if (legacyMode) { Review Comment: Is it possible to write a unit test so that the test fails before this fix and succeeds after the fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]
yihua commented on code in PR #11450: URL: https://github.com/apache/hudi/pull/11450#discussion_r1638823816 ## hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java: ## @@ -51,6 +51,10 @@ public ReadContext init(Configuration configuration, Map keyValu configuration.set(AvroWriteSupport.WRITE_OLD_LIST_STRUCTURE, "false", "support reading avro from non-legacy map/list in parquet file"); } +if (legacyMode) { Review Comment: Is it possible to write a simple test so that the test fails before this fix and succeeds after the fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
hudi-bot commented on PR #11449: URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166465320 ## CI report: * 6b21881333bd3870957950fd467189689887ec5c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24381) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] add `ad1happy2go` to github collaborators [hudi]
xushiyan merged PR #11447: URL: https://github.com/apache/hudi/pull/11447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] AWS Glue: An error occurred while calling o333.save. Failed to apply clean commit to metadata [hudi]
howardcho opened a new issue, #11454: URL: https://github.com/apache/hudi/issues/11454 Last night I was running multiple upsert Glue jobs against a single table (backfilling missing data), when I started getting this error: `An error occurred while calling o333.save. Failed to apply clean commit to metadata` Now, none of my jobs will complete successfully. I retried again this morning with my hourly incremental and it failed with the same error. I presume I accidentally had two jobs writing to the same partition, which caused some sort of deadlock. Could someone please assist me in getting my table back into a usable state? Hudi version: 0.14.0 Config: ``` {'hoodie.table.name': 'xxx', 'hoodie.datasource.write.table.type': 'COPY_ON_WRITE', 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.recordkey.field': 'received_year,received_month,received_day,request_uuid', 'hoodie.datasource.write.precombine.field': 'nats_timestamp', 'hoodie.datasource.write.hive_style_partitioning': True, 'hoodie.metadata.record.index.enable': False, 'hoodie.index.type': 'BLOOM', 'hoodie.parquet.max.file.size': 536870912, 'hoodie.parquet.small.file.limit': 104857600, 'hoodie.metadata.enable': 'true', 'hoodie.metadata.index.async': 'false', 'hoodie.metadata.index.column.stats.enable': 'true', 'hoodie.metadata.index.check.timeout.seconds': '60', 'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 'hoodie.write.lock.provider': 'org.apache.hudi.client.transaction.lock.InProcessLockProvider', 'hoodie.datasource.write.schema.allow.auto.evolution.column.drop': True, 'hoodie.datasource.write.partitionpath.field': 'received_year,re ceived_month,received_day', 'hoodie.datasource.hive_sync.partition_fields': 'received_year,received_month,received_day', 'hoodie.clean.automatic': 'true', 'hoodie.clean.async': 'false', 'hoodie.cleaner.policy': 'KEEP_LATEST_FILE_VERSIONS', 'hoodie.cleaner.fileversions.retained': '3', 'hoodie-conf hoodie.cleaner.parallelism': '200', 'hoodie.cleaner.commits.retained': 5, 'hoodie.parquet.compression.codec': 'gzip', 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.database': 'product_usage', 'hoodie.datasource.hive_sync.table': 'usage', 'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor', 'hoodie.datasource.hive_sync.use_jdbc': 'false', 'hoodie.datasource.hive_sync.mode': 'hms', 'hoodie.datasource.hive_sync.support_timestamp': True, 'hive_sync.support_timestamp': True, 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.ComplexKeyGenerator'} ``` * Stack trace below I re-ran a single incremental job with `hoodie.metadata.enabled=False` and it worked, but when trying to re-run the older backfill jobs, they continue to fail. I then tried modifying the `hoodie-conf hoodie.cleaner.parallelism` and `hoodie.cleaner.commits.retained`: ``` Writing Hudi 0.14.0 data with method: upsert {'hoodie.table.name': 'xxx', 'hoodie.datasource.write.table.type': 'COPY_ON_WRITE', 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.recordkey.field': 'received_year,received_month,received_day,request_uuid', 'hoodie.datasource.write.precombine.field': 'nats_timestamp', 'hoodie.datasource.write.hive_style_partitioning': True, 'hoodie.metadata.record.index.enable': False, 'hoodie.index.type': 'BLOOM', 'hoodie.parquet.max.file.size': 536870912, 'hoodie.parquet.small.file.limit': 104857600, 'hoodie.metadata.enable': 'true', 'hoodie.metadata.index.async': 'false', 'hoodie.metadata.index.column.stats.enable': 'true', 'hoodie.metadata.index.check.timeout.seconds': '60', 'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 'hoodie.write.lock.provider': 'org.apache.hudi.client.transaction.lock.InProcessLockProvider', 'hoodie.datasource.write.schema.allow.auto.evolution.column.drop': True, 'hoodie.datasource.write.partitionpath.field': 'received_year,re ceived_month,received_day', 'hoodie.datasource.hive_sync.partition_fields': 'received_year,received_month,received_day', 'hoodie.clean.automatic': 'true', 'hoodie.clean.async': 'false', 'hoodie.cleaner.policy': 'KEEP_LATEST_FILE_VERSIONS', 'hoodie.cleaner.fileversions.retained': '3', 'hoodie-conf hoodie.cleaner.parallelism': 10, 'hoodie.cleaner.commits.retained': 20, 'hoodie.parquet.compression.codec': 'gzip', 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.database': 'product_usage', 'hoodie.datasource.hive_sync.table': 'usage', 'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor', 'hoodie.datasource.hive_sync.use_jdbc': 'false', 'hoodie.datasource.hive_sync.mode': 'hms', 'hoodie.datasource.hive_sync.support_timestamp': True, 'hive_sync.support_timestamp': True, 'hoodie.datasource.write.keygenerator.class':
Re: [PR] [HUDI-7875] remove tablepath from fg reader [hudi]
hudi-bot commented on PR #11453: URL: https://github.com/apache/hudi/pull/11453#issuecomment-2166672996 ## CI report: * d82a4d6fb2c3cd0aa353c028d15659427eec4962 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24386) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]
hudi-bot commented on PR #11452: URL: https://github.com/apache/hudi/pull/11452#issuecomment-2166672929 ## CI report: * 4c87d3db776324da5a1e6f3caacb2b55c525ff36 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24385) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7875] remove tablepath from fg reader [hudi]
hudi-bot commented on PR #11453: URL: https://github.com/apache/hudi/pull/11453#issuecomment-2166659296 ## CI report: * d82a4d6fb2c3cd0aa353c028d15659427eec4962 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]
hudi-bot commented on PR #11452: URL: https://github.com/apache/hudi/pull/11452#issuecomment-2166659242 ## CI report: * 4c87d3db776324da5a1e6f3caacb2b55c525ff36 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7872] Recreate Glue table on certain types of exceptions [hudi]
hudi-bot commented on PR #11451: URL: https://github.com/apache/hudi/pull/11451#issuecomment-2166575530 ## CI report: * 7eee3194062ffa96e5253e5acdf4fcf48fc7040c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
hudi-bot commented on PR #11449: URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166647365 ## CI report: * 6b21881333bd3870957950fd467189689887ec5c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24381) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7872] Recreate Glue table on certain types of exceptions [hudi]
hudi-bot commented on PR #11451: URL: https://github.com/apache/hudi/pull/11451#issuecomment-2166647444 ## CI report: * 7eee3194062ffa96e5253e5acdf4fcf48fc7040c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24384) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7875] remove tablepath from fg reader [hudi]
jonvex opened a new pull request, #11453: URL: https://github.com/apache/hudi/pull/11453 ### Change Logs now we get the table path from the meta client. Also we will pass the meta client through the reader context to reduce the params of the record buffers. ### Impact easier for new engines to implement fg reader ### Risk level (write none, low medium or high below) low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
hudi-bot commented on PR #11448: URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166447956 ## CI report: * 9c2caef97e9e3a94097905179e57ab98e4fe8935 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
jonvex opened a new pull request, #11449: URL: https://github.com/apache/hudi/pull/11449 ### Change Logs HoodieFileGroupReader has too many params. We can get the tableconfig from the metaclient which is also a param. ### Impact Less params for fg reader ### Risk level (write none, low medium or high below) low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7872] Recreate Glue table on certain types of exceptions [hudi]
vamsikarnika opened a new pull request, #11451: URL: https://github.com/apache/hudi/pull/11451 ### Change Logs Recreate and sync glue and hive tables when there's an exception happens while syncing schema, properties or partitions. To enable this feature, we need to set below flags to true `hoodie.datasource.hive_sync.recreate_table_on_error=true` `hoodie.datasource.meta.sync.glue.recreate_table_on_error=true` ### Impact Low: When catalog sync fails for the first time, we will retry one more time by dropping and recreating the table, which might increase sync time. ### Risk level (write none, low medium or high below) LOW: When sync fails and we try to recreate the table, if drop table succeeds and fails to create the table, we might end up deleting customers table. We need to add an alert to catch this scenario. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7873] remove getStorage method from reader context [hudi]
jonvex opened a new pull request, #11452: URL: https://github.com/apache/hudi/pull/11452 ### Change Logs Remove this method because it is implemented the same for all reader contexts, and it is only used by a test. ### Impact Easier to implement fg reader for a new engine ### Risk level (write none, low medium or high below) none ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
hudi-bot commented on PR #11448: URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166558515 ## CI report: * 9c2caef97e9e3a94097905179e57ab98e4fe8935 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24380) * 7a032b2d72443ead03c3fb39af22552db360a2ba Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24382) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
jonvex commented on code in PR #11449: URL: https://github.com/apache/hudi/pull/11449#discussion_r1638737559 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -92,7 +92,6 @@ public HoodieFileGroupReader(HoodieReaderContext readerContext, Option internalSchemaOpt, HoodieTableMetaClient hoodieTableMetaClient, TypedProperties props, - HoodieTableConfig tableConfig, Review Comment: Yeah, I can keep getting rid of more. Just want to keep the prs small -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
hudi-bot commented on PR #11448: URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166545502 ## CI report: * 9c2caef97e9e3a94097905179e57ab98e4fe8935 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24380) * 7a032b2d72443ead03c3fb39af22552db360a2ba UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-3304] Add support for selective partial update [hudi]
hudi-bot commented on PR #9979: URL: https://github.com/apache/hudi/pull/9979#issuecomment-2166542192 ## CI report: * b038e47bc8365959cc7d9a4a4d5fe07e081dd64e UNKNOWN * d4cd757a7318a5ff2d1b3c3b10eddbbb91b8059f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24379) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Fix Hudi being able to read 2-level structure if explicit flag for wr… [hudi]
VitoMakarevich opened a new pull request, #11450: URL: https://github.com/apache/hudi/pull/11450 If I have `"spark.hadoop.parquet.avro.write-old-list-structure", "false"` explicitly set - to being able to write nulls inside arrays(the only way), Hudi starts to write Parquets with the following schema inside: ``` required group internal_list (LIST) { repeated group list { required int64 element; } } ``` But if I had some files produced before setting `"spark.hadoop.parquet.avro.write-old-list-structure", "false"`, they have the following schema inside ``` required group internal_list (LIST) { repeated int64 array; } ``` And Hudi 0.14.x at least fails to read records from such file - failing with exception `Caused by: java.lang.RuntimeException: Null-value for required field: ` Even though the contents of arrays is `not null`(it cannot be null in fact since Avro requires `spark.hadoop.parquet.avro.write-old-list-structure` = `false` to write `null`s. ### Expected behavior Taken from Hudi 0.12.1(not sure what exactly broke that): 1. If I have a file with 2 level structure and update(not matter having nulls inside array or not - both produce the same) arrives with "spark.hadoop.parquet.avro.write-old-list-structure", "false" - overwrite it into 3 level.(**fails in 0.14.1**) 2. If I have 3 level structure with nulls and update cames(not matter with nulls or without) - read and write correctly The simple reproduction of issue can be found here: https://github.com/VitoMakarevich/hudi-issue-014 Highly likely, the problem appeared after Hudi made some changes, so values from Hadoop conf started to propagate into Reader instance(likely they were not propagated before). ### Change Logs Added explicit override of `spark.hadoop.parquet.avro.write-old-list-structure` = `true` if file being read is old(has 2 level structure). ### Impact Running tests to ensure no unexpected issues propagating. ### Risk level (write none, low medium or high below) medium ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
yihua commented on code in PR #11448: URL: https://github.com/apache/hudi/pull/11448#discussion_r1638694040 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -256,6 +257,137 @@ public static RecordMergeMode getRecordMergeMode(Properties props) { return RecordMergeMode.valueOf(mergeMode); } + public static Builder builder() { +return new Builder<>(); + } + + public static class Builder { + +HoodieReaderContext readerContext; +HoodieStorage storage; +String tablePath; +String latestCommitTime; +FileSlice fileSlice; +Schema dataSchema; +Schema requestedSchema; +Option internalSchemaOpt; +HoodieTableMetaClient hoodieTableMetaClient; +TypedProperties props; +HoodieTableConfig tableConfig; +long start; +long length; +boolean shouldUseRecordPosition = false; +long maxMemorySizeInBytes; +String spillableMapBasePath; +ExternalSpillableMap.DiskMapType diskMapType; +boolean isBitCaskDiskMapCompressionEnabled; + +public Builder withReaderContext(HoodieReaderContext readerContext) { + this.readerContext = readerContext; + return this; +} + +public Builder withHoodieStorage(HoodieStorage storage) { + this.storage = storage; + return this; +} + +public Builder withTablePath(String tablePath) { + this.tablePath = tablePath; + return this; +} + +public Builder withLatestCommitTime(String latestCommitTime) { + this.latestCommitTime = latestCommitTime; + return this; +} + +public Builder withFileSlice(FileSlice fileSlice) { + this.fileSlice = fileSlice; + return this; +} + +public Builder withDataSchema(Schema dataSchema) { + this.dataSchema = dataSchema; + return this; +} + +public Builder withRequestedSchema(Schema requestedSchema) { + this.requestedSchema = requestedSchema; + return this; +} + +public Builder withInternalSchemaOpt(Option internalSchemaOpt) { + this.internalSchemaOpt = internalSchemaOpt; + return this; +} + +public Builder withMetaClient(HoodieTableMetaClient hoodieTableMetaClient) { + this.hoodieTableMetaClient = hoodieTableMetaClient; + return this; +} + +public Builder withTypedProperties(TypedProperties props) { + this.props = props; + return this; +} + +public Builder withTableConfig(HoodieTableConfig tableConfig) { + this.tableConfig = tableConfig; + return this; +} + +public Builder withStart(long start) { + this.start = start; + return this; +} + +public Builder withLength(long length) { + this.length = length; + return this; +} + +public Builder withUseRecordPosition(boolean shouldUseRecordPosition) { + this.shouldUseRecordPosition = shouldUseRecordPosition; + return this; +} + +public Builder withMaxMemorySizeInBytes(long maxMemorySizeInBytes) { + this.maxMemorySizeInBytes = maxMemorySizeInBytes; + return this; +} + +public Builder withSpillableMapBasePath(String spillableMapBasePath) { Review Comment: Could you simplify the number of configs in the builder assuming that the user can use the file group reader API without having deep knowledge on Hudi? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
yihua commented on code in PR #11449: URL: https://github.com/apache/hudi/pull/11449#discussion_r1638685103 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -92,7 +92,6 @@ public HoodieFileGroupReader(HoodieReaderContext readerContext, Option internalSchemaOpt, HoodieTableMetaClient hoodieTableMetaClient, TypedProperties props, - HoodieTableConfig tableConfig, Review Comment: Could you think about simplifying all the parameters? The goal should be that given the file group ID and the query type with the reader context, storage instance, and the minimal set of configs (maybe with meta client), the file group reader should be able to figure out all necessary configs to fill in for reading the records out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-3304] Add support for selective partial update [hudi]
hudi-bot commented on PR #9979: URL: https://github.com/apache/hudi/pull/9979#issuecomment-2166461134 ## CI report: * b038e47bc8365959cc7d9a4a4d5fe07e081dd64e UNKNOWN * 5ea3b0b905186b2701ee57f466cbec82043ddbea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23601) * d4cd757a7318a5ff2d1b3c3b10eddbbb91b8059f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24379) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
codope commented on code in PR #11448: URL: https://github.com/apache/hudi/pull/11448#discussion_r1638668531 ## hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java: ## @@ -280,25 +279,24 @@ private void validateOutputFromFileGroupReader(StorageConfiguration storageCo props.setProperty(PARTITION_FIELDS.key(), metaClient.getTableConfig().getString(PARTITION_FIELDS)); } assertEquals(containsBaseFile, fileSlice.getBaseFile().isPresent()); -HoodieFileGroupReader fileGroupReader = new HoodieFileGroupReader<>( -getHoodieReaderContext(tablePath, avroSchema, storageConf), -metaClient.getStorage(), -tablePath, -metaClient.getActiveTimeline().lastInstant().get().getTimestamp(), -fileSlice, -avroSchema, -avroSchema, -Option.empty(), -metaClient, -props, -metaClient.getTableConfig(), -0, -fileSlice.getTotalFileSize(), -false, -1024 * 1024 * 1000, -metaClient.getTempFolderPath(), -ExternalSpillableMap.DiskMapType.ROCKS_DB, -false); +HoodieFileGroupReader fileGroupReader = HoodieFileGroupReader.builder() +.withReaderContext(getHoodieReaderContext(tablePath, avroSchema, storageConf)) +.withHoodieStorage(metaClient.getStorage()) +.withTablePath(tablePath) + .withLatestCommitTime(metaClient.getActiveTimeline().lastInstant().get().getTimestamp()) +.withFileSlice(fileSlice) +.withDataSchema(avroSchema) +.withRequestedSchema(avroSchema) +.withMetaClient(metaClient) +.withTypedProperties(props) +.withTableConfig(metaClient.getTableConfig()) +.withStart(0) +.withLength(fileSlice.getTotalFileSize()) +.withMaxMemorySizeInBytes(1024 * 1024 * 1000) Review Comment: Also, consider making the size configurable instead of hard coded value. Can do it in a followup pr. ## hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java: ## @@ -280,25 +279,24 @@ private void validateOutputFromFileGroupReader(StorageConfiguration storageCo props.setProperty(PARTITION_FIELDS.key(), metaClient.getTableConfig().getString(PARTITION_FIELDS)); } assertEquals(containsBaseFile, fileSlice.getBaseFile().isPresent()); -HoodieFileGroupReader fileGroupReader = new HoodieFileGroupReader<>( -getHoodieReaderContext(tablePath, avroSchema, storageConf), -metaClient.getStorage(), -tablePath, -metaClient.getActiveTimeline().lastInstant().get().getTimestamp(), -fileSlice, -avroSchema, -avroSchema, -Option.empty(), -metaClient, -props, -metaClient.getTableConfig(), -0, -fileSlice.getTotalFileSize(), -false, -1024 * 1024 * 1000, -metaClient.getTempFolderPath(), -ExternalSpillableMap.DiskMapType.ROCKS_DB, -false); +HoodieFileGroupReader fileGroupReader = HoodieFileGroupReader.builder() +.withReaderContext(getHoodieReaderContext(tablePath, avroSchema, storageConf)) +.withHoodieStorage(metaClient.getStorage()) +.withTablePath(tablePath) + .withLatestCommitTime(metaClient.getActiveTimeline().lastInstant().get().getTimestamp()) +.withFileSlice(fileSlice) +.withDataSchema(avroSchema) +.withRequestedSchema(avroSchema) +.withMetaClient(metaClient) +.withTypedProperties(props) +.withTableConfig(metaClient.getTableConfig()) +.withStart(0) +.withLength(fileSlice.getTotalFileSize()) +.withMaxMemorySizeInBytes(1024 * 1024 * 1000) Review Comment: While we're at refactoring, can we also declare this size as constant at some common place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]
hudi-bot commented on PR #11449: URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166448035 ## CI report: * 6b21881333bd3870957950fd467189689887ec5c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-3304] Add support for selective partial update [hudi]
hudi-bot commented on PR #9979: URL: https://github.com/apache/hudi/pull/9979#issuecomment-2166429555 ## CI report: * b038e47bc8365959cc7d9a4a4d5fe07e081dd64e UNKNOWN * 5ea3b0b905186b2701ee57f466cbec82043ddbea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23601) * d4cd757a7318a5ff2d1b3c3b10eddbbb91b8059f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] URI too long error [hudi]
michael1991 commented on issue #11446: URL: https://github.com/apache/hudi/issues/11446#issuecomment-2166133152 Hi @ad1happy2go , glad to see you again ~ Can you try column name with underscore, i'm not sure if enable urlencode for partition and partition column name with underscore could make this happen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]
jonvex opened a new pull request, #11448: URL: https://github.com/apache/hudi/pull/11448 ### Change Logs Number of constructor params is getting too long for the fg reader. Use builder style instead. ### Impact Easier to keep track of params for fg reader ### Risk level (write none, low medium or high below) low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7869] Ensure properties are copied when modifying schema [hudi]
hudi-bot commented on PR #11441: URL: https://github.com/apache/hudi/pull/11441#issuecomment-2165889684 ## CI report: * 80572ef48e06b8c794e53c5db94aebc95c23c34d UNKNOWN * 72b491d8467da51aaa5e840631027a98ddb4cf93 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24375) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] URI too long error [hudi]
ad1happy2go commented on issue #11446: URL: https://github.com/apache/hudi/issues/11446#issuecomment-2166095081 @michael1991 Thanks for raising this. Can you help me to reproduce this issue. I tried below but it was working fine for me. ``` fake = Faker() data = [{"ID": fake.uuid4(), "EventTime": "2023-03-04 14:44:42.046661", "FullName": fake.name(), "Address": fake.address(), "CompanyName": fake.company(), "JobTitle": fake.job(), "EmailAddress": fake.email(), "PhoneNumber": fake.phone_number(), "RandomText": fake.sentence(), "CityNameDummyBigFieldName": fake.city(), "ts":"1", "StateNameDummyBigFieldName": fake.state(), "Country": fake.country()} for _ in range(1000)] pandas_df = pd.DataFrame(data) hoodie_properties = { 'hoodie.datasource.write.table.type': 'COPY_ON_WRITE', 'hoodie.datasource.write.operation': 'upsert', 'hoodie.datasource.write.hive_style_partitioning': 'true', 'hoodie.datasource.write.recordkey.field': 'ID', 'hoodie.datasource.write.partitionpath.field': 'StateNameDummyBigFieldName,CityNameDummyBigFieldName', 'hoodie.table.name' : 'test' } spark.sparkContext.setLogLevel("WARN") df = spark.createDataFrame(pandas_df) df.write.format("hudi").options(**hoodie_properties).mode("overwrite").save(PATH) for i in range(1, 50): df.write.format("hudi").options(**hoodie_properties).mode("append").save(PATH) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7869] Ensure properties are copied when modifying schema [hudi]
jonvex merged PR #11441: URL: https://github.com/apache/hudi/pull/11441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]
hudi-bot commented on PR #11445: URL: https://github.com/apache/hudi/pull/11445#issuecomment-2166057193 ## CI report: * a46c941e943db5ce71526fde45b8cd8029257b94 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24376) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] project hudi-common: Compilation failure: Compilation failure [hudi]
HuangZhenQiu commented on issue #9744: URL: https://github.com/apache/hudi/issues/9744#issuecomment-2165894999 The issue is highly probably due to the maven points to different java version. Please make sure maven points the JDK 1.8 % mvn -version Apache Maven 3.9.7 (8b094c9513efc1b9ce2d952b3b9c8eaedaf8cbf0) Maven home: /opt/homebrew/Cellar/maven/3.9.7/libexec Java version: 1.8.0_202, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: "mac os x", version: "10.16", arch: "x86_64", family: "mac" hpeter@Zhenqius-MBP ~ % -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] URI too long error [hudi]
michael1991 opened a new issue, #11446: URL: https://github.com/apache/hudi/issues/11446 **Describe the problem you faced** I'm using Spark3.5 + Hudi0.15.0 for partitioned table, when I choose req_date and req_hour for partition column name, I will get this error, but task would be executed successfully finally; when I choose date and hour for partition column name, error disappeared. **Expected behavior** We should get no errors when we just make partition column names a bit longer. **Environment Description** * Hudi version : 0.15.0 * Spark version : 3.5.0 * Hive version : NA * Hadoop version : 3.3.6 * Storage (HDFS/S3/GCS..) : GCS * Running on Docker? (yes/no) : no **Stacktrace** ``` 2024-06-13 13:21:13 ERROR PriorityBasedFileSystemView:129 - Got error running preferred function. Trying secondary org.apache.hudi.exception.HoodieRemoteException: URI Too Long at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.loadPartitions(RemoteHoodieTableFileSystemView.java:447) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.loadPartitions(RemoteHoodieTableFileSystemView.java:465) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.lambda$loadPartitions$6e5c444d$1(PriorityBasedFileSystemView.java:187) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:69) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.loadPartitions(PriorityBasedFileSystemView.java:185) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:133) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:174) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:200) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleCleaning(HoodieSparkCopyOnWriteTable.java:212) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.client.BaseHoodieTableServiceClient.scheduleTableServiceInternal(BaseHoodieTableServiceClient.java:647) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:746) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:843) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:816) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:847) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.client.BaseHoodieWriteClient.autoCleanOnCommit(BaseHoodieWriteClient.java:581) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:560) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:251) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:108) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1082) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:508) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168) ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0] at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) ~[spark-sql_2.12-3.5.0.jar:0.15.0] at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) ~[spark-sql_2.12-3.5.0.jar:3.5.0] at
Re: [I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]
nsivabalan commented on issue #11419: URL: https://github.com/apache/hudi/issues/11419#issuecomment-2165786565 thanks @beyond1920 . please put out a patch. I would like to review as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]
hudi-bot commented on PR #11445: URL: https://github.com/apache/hudi/pull/11445#issuecomment-2165769508 ## CI report: * a46c941e943db5ce71526fde45b8cd8029257b94 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24376) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]
hudi-bot commented on PR #11445: URL: https://github.com/apache/hudi/pull/11445#issuecomment-2165744949 ## CI report: * a46c941e943db5ce71526fde45b8cd8029257b94 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7869] Ensure properties are copied when modifying schema [hudi]
hudi-bot commented on PR #11441: URL: https://github.com/apache/hudi/pull/11441#issuecomment-2165723395 ## CI report: * 80572ef48e06b8c794e53c5db94aebc95c23c34d UNKNOWN * 72b491d8467da51aaa5e840631027a98ddb4cf93 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24375) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]
beyond1920 commented on PR #11445: URL: https://github.com/apache/hudi/pull/11445#issuecomment-2165666407 It's a draft yet, would add some test case soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]
beyond1920 opened a new pull request, #11445: URL: https://github.com/apache/hudi/pull/11445 ### Change Logs We should not skip delete invalid files during finalizing the writer. If files deletion failed, users might get wrong result. The pr aims to fix the [issue#11419](https://github.com/apache/hudi/issues/11419). ### Impact Fix the bug when delete invalid files. ### Risk level (write none, low medium or high below) None ### Documentation Update None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7869] Ensure properties are copied when modifying schema [hudi]
hudi-bot commented on PR #11441: URL: https://github.com/apache/hudi/pull/11441#issuecomment-2165606342 ## CI report: * 80572ef48e06b8c794e53c5db94aebc95c23c34d UNKNOWN * 72b491d8467da51aaa5e840631027a98ddb4cf93 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7847] Infer record merge mode during table upgrade [hudi]
hudi-bot commented on PR #11439: URL: https://github.com/apache/hudi/pull/11439#issuecomment-2165553658 ## CI report: * 6fb054e35d656d4cbcf732b1ee4be13bb122b57a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24374) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7847] Infer record merge mode during table upgrade [hudi]
geserdugarov commented on PR #11439: URL: https://github.com/apache/hudi/pull/11439#issuecomment-2165293826 > I found new MR that could help to make end-to-end test. I will try to cherry-pick #11443 and make full read by Hudi 1.0 beta of a table initially written by Hudi 0.14. Tried changes from #11443, and got: `org.apache.hudi.exception.HoodieIOException: Could not read commit details from file:/tmp/MOR_event_time/.hoodie/20240612164545738_1718275031205.deltacommit` I suppose with changes on task [HUDI-7857](https://issues.apache.org/jira/browse/HUDI-7857), it could be possible to read the table (with metadata table turned off). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7838] Use Config hoodie.schema.cache.enable in HoodieBaseFileGroupRecordBuffer and AbstractHoodieLogRecordReader [hudi]
wombatu-kun commented on code in PR #11444: URL: https://github.com/apache/hudi/pull/11444#discussion_r1637989406 ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java: ## @@ -72,4 +72,10 @@ public class HoodieReaderConfig extends HoodieConfig { .markAdvanced() .sinceVersion("1.0.0") .withDocumentation("Whether to use positions in the block header for data blocks containing updates and delete blocks for merging."); + + public static final ConfigProperty ENABLE_INTERNAL_SCHEMA_CACHE = ConfigProperty + .key("hoodie.schema.cache.enable") + .defaultValue(false) Review Comment: Do you want me to change the defaultValue of this config to `true`, or to remove this config property completely and use internal schema cache everywhere unconditionally? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org