Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


KnightChess commented on code in PR #11448:
URL: https://github.com/apache/hudi/pull/11448#discussion_r1639092577


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -256,6 +257,137 @@ public static RecordMergeMode 
getRecordMergeMode(Properties props) {
 return RecordMergeMode.valueOf(mergeMode);
   }
 
+  public static Builder builder() {
+return new Builder<>();
+  }
+
+  public static class Builder {
+
+HoodieReaderContext readerContext;
+HoodieStorage storage;
+String tablePath;
+String latestCommitTime;
+FileSlice fileSlice;
+Schema dataSchema;
+Schema requestedSchema;
+Option internalSchemaOpt;
+HoodieTableMetaClient hoodieTableMetaClient;
+TypedProperties props;
+HoodieTableConfig tableConfig;
+long start;
+long length;
+boolean shouldUseRecordPosition = false;
+long maxMemorySizeInBytes;
+String spillableMapBasePath;
+ExternalSpillableMap.DiskMapType diskMapType;
+boolean isBitCaskDiskMapCompressionEnabled;
+
+public Builder withReaderContext(HoodieReaderContext readerContext) {
+  this.readerContext = readerContext;
+  return this;
+}
+
+public Builder withHoodieStorage(HoodieStorage storage) {
+  this.storage = storage;
+  return this;
+}
+
+public Builder withTablePath(String tablePath) {
+  this.tablePath = tablePath;
+  return this;
+}
+
+public Builder withLatestCommitTime(String latestCommitTime) {
+  this.latestCommitTime = latestCommitTime;
+  return this;
+}
+
+public Builder withFileSlice(FileSlice fileSlice) {
+  this.fileSlice = fileSlice;
+  return this;
+}
+
+public Builder withDataSchema(Schema dataSchema) {
+  this.dataSchema = dataSchema;
+  return this;
+}
+
+public Builder withRequestedSchema(Schema requestedSchema) {
+  this.requestedSchema = requestedSchema;
+  return this;
+}
+
+public Builder withInternalSchemaOpt(Option 
internalSchemaOpt) {
+  this.internalSchemaOpt = internalSchemaOpt;
+  return this;
+}
+
+public Builder withMetaClient(HoodieTableMetaClient 
hoodieTableMetaClient) {
+  this.hoodieTableMetaClient = hoodieTableMetaClient;
+  return this;
+}
+
+public Builder withTypedProperties(TypedProperties props) {
+  this.props = props;
+  return this;
+}
+
+public Builder withTableConfig(HoodieTableConfig tableConfig) {
+  this.tableConfig = tableConfig;
+  return this;
+}
+
+public Builder withStart(long start) {
+  this.start = start;
+  return this;
+}
+
+public Builder withLength(long length) {
+  this.length = length;
+  return this;
+}
+
+public Builder withUseRecordPosition(boolean shouldUseRecordPosition) {
+  this.shouldUseRecordPosition = shouldUseRecordPosition;
+  return this;
+}
+
+public Builder withMaxMemorySizeInBytes(long maxMemorySizeInBytes) {
+  this.maxMemorySizeInBytes = maxMemorySizeInBytes;
+  return this;
+}
+
+public Builder withSpillableMapBasePath(String spillableMapBasePath) {
+  this.spillableMapBasePath = spillableMapBasePath;
+  return this;
+}
+
+public Builder withDiskMapType(ExternalSpillableMap.DiskMapType 
diskMapType) {
+  this.diskMapType = diskMapType;
+  return this;
+}
+
+public Builder withBitCaskDiskMapCompressionEnabled(boolean 
isBitCaskDiskMapCompressionEnabled) {
+  this.isBitCaskDiskMapCompressionEnabled = 
isBitCaskDiskMapCompressionEnabled;
+  return this;
+}
+
+public HoodieFileGroupReader build() {
+  ValidationUtils.checkArgument(readerContext != null);
+  ValidationUtils.checkArgument(fileSlice != null);
+  ValidationUtils.checkArgument(dataSchema != null);
+  ValidationUtils.checkArgument(requestedSchema != null);
+  if (internalSchemaOpt == null) {

Review Comment:
   other like `shouldUseRecordPosition`, 
`maxMemorySizeInBytes`,`spillableMapBasePath`... may be can give default value



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]

2024-06-13 Thread via GitHub


KnightChess commented on code in PR #11455:
URL: https://github.com/apache/hudi/pull/11455#discussion_r1639086482


##
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestHoodiePositionBasedFileGroupRecordBuffer.java:
##
@@ -125,17 +126,17 @@ public void prepareBuffer(RecordMergeMode mergeMode) 
throws Exception {
 Option.empty(), metaClient.getTableConfig()));
 TypedProperties props = new TypedProperties();
 props.put(HoodieCommonConfig.RECORD_MERGE_MODE.key(), mergeMode.name());
+
props.setProperty(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.key(),String.valueOf(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.defaultValue()));

Review Comment:
   `1024 * 1024 * 1000 ` replace defaultValue



##
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java:
##
@@ -276,6 +277,10 @@ private void 
validateOutputFromFileGroupReader(StorageConfiguration storageCo
 props.setProperty("hoodie.payload.ordering.field", "timestamp");
 props.setProperty(RECORD_MERGER_STRATEGY.key(), 
RECORD_MERGER_STRATEGY.defaultValue());
 props.setProperty(RECORD_MERGE_MODE.key(), recordMergeMode.name());
+props.setProperty(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.key(), 
String.valueOf(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.defaultValue()));

Review Comment:
   `1024 * 1024 * 1000 ` replace defaultValue



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11455:
URL: https://github.com/apache/hudi/pull/11455#issuecomment-2167021177

   
   ## CI report:
   
   * 2704f291b4e87e2776bc7dfd9539c5ba5f7d2749 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24393)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11452:
URL: https://github.com/apache/hudi/pull/11452#issuecomment-2167021148

   
   ## CI report:
   
   * 8c9f4bfd2eb32b3cb06b755bf9210607ea5e865a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24392)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2167021098

   
   ## CI report:
   
   * 01581df81e33432179d9dfe574f8e9ae74f18038 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24391)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]

2024-06-13 Thread via GitHub


KnightChess commented on code in PR #11445:
URL: https://github.com/apache/hudi/pull/11445#discussion_r1639082667


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##
@@ -713,11 +714,24 @@ private void 
deleteInvalidFilesByPartitions(HoodieEngineContext context, 

Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]

2024-06-13 Thread via GitHub


danny0405 commented on code in PR #11445:
URL: https://github.com/apache/hudi/pull/11445#discussion_r1639080670


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##
@@ -713,11 +714,24 @@ private void 
deleteInvalidFilesByPartitions(HoodieEngineContext context, 

Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]

2024-06-13 Thread via GitHub


wombatu-kun commented on code in PR #11385:
URL: https://github.com/apache/hudi/pull/11385#discussion_r1639065444


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -581,23 +572,27 @@ public void validateTableProperties(Properties 
properties) {
*/
   public static HoodieTableMetaClient 
initTableAndGetMetaClient(StorageConfiguration storageConf, String basePath,
 Properties 
props) throws IOException {
+return initTableAndGetMetaClient(storageConf, new StoragePath(basePath), 
props);

Review Comment:
   Ok, i'll create jira-task a bit later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]

2024-06-13 Thread via GitHub


wombatu-kun commented on code in PR #11385:
URL: https://github.com/apache/hudi/pull/11385#discussion_r1639065241


##
hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java:
##
@@ -95,7 +95,7 @@ storage, new StoragePath(logFilePathPattern)).stream()
 new HashMap<>();
 int numCorruptBlocks = 0;
 int dummyInstantTimeCount = 0;
-String basePath = 
HoodieCLI.getTableMetaClient().getBasePathV2().toString();
+String basePath = HoodieCLI.basePath;

Review Comment:
   No, I haven't tested it. But as I see from the code, you need to connect to 
a different table if you want to change the base path. And during this process 
(connecting) both basePath and metaClient are updated. Also `connect` method is 
invoked on execution of createTable command. So these changes should not break 
anything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11452:
URL: https://github.com/apache/hudi/pull/11452#issuecomment-2166930481

   
   ## CI report:
   
   * 4c87d3db776324da5a1e6f3caacb2b55c525ff36 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24385)
 
   * 8c9f4bfd2eb32b3cb06b755bf9210607ea5e865a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24392)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11455:
URL: https://github.com/apache/hudi/pull/11455#issuecomment-2166930511

   
   ## CI report:
   
   * 468720a4bc3c521e71e2d2e6459d37fd27f8444b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24389)
 
   * 2704f291b4e87e2776bc7dfd9539c5ba5f7d2749 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24393)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166930448

   
   ## CI report:
   
   * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383)
 
   * 01581df81e33432179d9dfe574f8e9ae74f18038 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24391)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11455:
URL: https://github.com/apache/hudi/pull/11455#issuecomment-2166923741

   
   ## CI report:
   
   * 468720a4bc3c521e71e2d2e6459d37fd27f8444b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24389)
 
   * 2704f291b4e87e2776bc7dfd9539c5ba5f7d2749 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166923627

   
   ## CI report:
   
   * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383)
 
   * 01581df81e33432179d9dfe574f8e9ae74f18038 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11452:
URL: https://github.com/apache/hudi/pull/11452#issuecomment-2166923682

   
   ## CI report:
   
   * 4c87d3db776324da5a1e6f3caacb2b55c525ff36 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24385)
 
   * 8c9f4bfd2eb32b3cb06b755bf9210607ea5e865a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


VitoMakarevich commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166919410

   Also I see [PR](https://github.com/apache/hudi/pull/7512) which introduced 
custom class of
   
`hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java`.
   
   As I understand it came after the developers' decision to not use the schema 
with which the file has been written in favor of the deduced writer schema. So 
the purpose of the previous PR was that:
   
   If for some reason Parquet file is written in a new style(3-level nesting) - 
likely with some other than Spark tool or with 
"spark.hadoop.parquet.avro.write-old-list-structure", "false" - then if there 
is no overrides(safety measures first), kindly request reader to read it as a 
new style.
   Without it - it was likely leading to the same issue we are facing, so the 
current change is basically the reverse case - if the file was written as 2 
level, no matter what setting is in the runtime, use 2 level readers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11455:
URL: https://github.com/apache/hudi/pull/11455#issuecomment-2166917379

   
   ## CI report:
   
   * 468720a4bc3c521e71e2d2e6459d37fd27f8444b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24389)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7875] remove tablepath from fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11453:
URL: https://github.com/apache/hudi/pull/11453#issuecomment-2166917349

   
   ## CI report:
   
   * b06455ddb5402cbb0d7df375b7595277f8b0eab9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24388)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166917289

   
   ## CI report:
   
   * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


VitoMakarevich commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166915644

   In certain setups, this problem leads to silent data loss. 
https://github.com/VitoMakarevich/hudi-issue-014?tab=readme-ov-file#silent-dataloss
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


VitoMakarevich commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166879255

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


jonvex merged PR #11449:
URL: https://github.com/apache/hudi/pull/11449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] add builder to filegroup reader [hudi]

2024-06-13 Thread via GitHub


jonvex closed pull request #10630: [HUDI-7386] add builder to filegroup reader
URL: https://github.com/apache/hudi/pull/10630


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11452:
URL: https://github.com/apache/hudi/pull/11452#issuecomment-2166852841

   
   ## CI report:
   
   * 4c87d3db776324da5a1e6f3caacb2b55c525ff36 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24385)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166852778

   
   ## CI report:
   
   * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11449:
URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166852725

   
   ## CI report:
   
   * 6b21881333bd3870957950fd467189689887ec5c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24381)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


nsivabalan commented on code in PR #11449:
URL: https://github.com/apache/hudi/pull/11449#discussion_r1638777639


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -92,7 +92,6 @@ public HoodieFileGroupReader(HoodieReaderContext 
readerContext,
Option internalSchemaOpt,
HoodieTableMetaClient hoodieTableMetaClient,
TypedProperties props,
-   HoodieTableConfig tableConfig,

Review Comment:
   +1 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


nsivabalan commented on code in PR #11448:
URL: https://github.com/apache/hudi/pull/11448#discussion_r1638753499


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -256,6 +257,137 @@ public static RecordMergeMode 
getRecordMergeMode(Properties props) {
 return RecordMergeMode.valueOf(mergeMode);
   }
 
+  public static Builder builder() {
+return new Builder<>();
+  }
+
+  public static class Builder {
+
+HoodieReaderContext readerContext;
+HoodieStorage storage;
+String tablePath;
+String latestCommitTime;
+FileSlice fileSlice;
+Schema dataSchema;
+Schema requestedSchema;
+Option internalSchemaOpt;
+HoodieTableMetaClient hoodieTableMetaClient;
+TypedProperties props;
+HoodieTableConfig tableConfig;
+long start;
+long length;
+boolean shouldUseRecordPosition = false;
+long maxMemorySizeInBytes;
+String spillableMapBasePath;
+ExternalSpillableMap.DiskMapType diskMapType;
+boolean isBitCaskDiskMapCompressionEnabled;

Review Comment:
   wow. too many args. 



##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -256,6 +257,137 @@ public static RecordMergeMode 
getRecordMergeMode(Properties props) {
 return RecordMergeMode.valueOf(mergeMode);
   }
 
+  public static Builder builder() {
+return new Builder<>();
+  }
+
+  public static class Builder {
+
+HoodieReaderContext readerContext;

Review Comment:
   private ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]

2024-06-13 Thread via GitHub


codope commented on code in PR #11455:
URL: https://github.com/apache/hudi/pull/11455#discussion_r1638968211


##
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java:
##
@@ -276,6 +277,10 @@ private void 
validateOutputFromFileGroupReader(StorageConfiguration storageCo
 props.setProperty("hoodie.payload.ordering.field", "timestamp");
 props.setProperty(RECORD_MERGER_STRATEGY.key(), 
RECORD_MERGER_STRATEGY.defaultValue());
 props.setProperty(RECORD_MERGE_MODE.key(), recordMergeMode.name());
+
props.setProperty(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.key(),String.valueOf(1024
 * 1024 * 1000));

Review Comment:
   store the default value in a constant.



##
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestHoodiePositionBasedFileGroupRecordBuffer.java:
##
@@ -125,17 +126,17 @@ public void prepareBuffer(RecordMergeMode mergeMode) 
throws Exception {
 Option.empty(), metaClient.getTableConfig()));
 TypedProperties props = new TypedProperties();
 props.put(HoodieCommonConfig.RECORD_MERGE_MODE.key(), mergeMode.name());
+
props.setProperty(HoodieMemoryConfig.MAX_MEMORY_FOR_MERGE.key(),String.valueOf(1024
 * 1024 * 1000));

Review Comment:
   same here and at other places



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix Hudi being able to read 2-level structure if explicit flag for wr… [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166558699

   
   ## CI report:
   
   * a13f012152b0cd41feccfd589ea438f3e1697607 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] AWS Glue: An error occurred while calling o333.save. Failed to apply clean commit to metadata [hudi]

2024-06-13 Thread via GitHub


howardcho closed issue #11454: [SUPPORT] AWS Glue: An error occurred while 
calling o333.save. Failed to apply clean commit to metadata
URL: https://github.com/apache/hudi/issues/11454


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] AWS Glue: An error occurred while calling o333.save. Failed to apply clean commit to metadata [hudi]

2024-06-13 Thread via GitHub


howardcho commented on issue #11454:
URL: https://github.com/apache/hudi/issues/11454#issuecomment-2166804682

   While reviewing the config, I saw this setting: `hoodie-conf 
hoodie.cleaner.parallelism`. This seemed incorrect, so I removed the 
`hoodie-conf` portion, and re-ran the jobs, and they seem to be working. Thanks 
and sorry for the alarm. Closing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7875] remove tablepath from fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11453:
URL: https://github.com/apache/hudi/pull/11453#issuecomment-2166794958

   
   ## CI report:
   
   * d82a4d6fb2c3cd0aa353c028d15659427eec4962 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24386)
 
   * b06455ddb5402cbb0d7df375b7595277f8b0eab9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24388)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11455:
URL: https://github.com/apache/hudi/pull/11455#issuecomment-2166795004

   
   ## CI report:
   
   * 468720a4bc3c521e71e2d2e6459d37fd27f8444b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24389)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166794881

   
   ## CI report:
   
   * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11374:
URL: https://github.com/apache/hudi/pull/11374#issuecomment-2166794559

   
   ## CI report:
   
   * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN
   * 531bfafd735b515fc756f3f9fc8a6b929f8e2c88 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24387)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix Hudi being able to read 2-level structure if explicit flag for wr… [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166575387

   
   ## CI report:
   
   * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


VitoMakarevich commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166784596

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7875] remove tablepath from fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11453:
URL: https://github.com/apache/hudi/pull/11453#issuecomment-2166782661

   
   ## CI report:
   
   * d82a4d6fb2c3cd0aa353c028d15659427eec4962 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24386)
 
   * b06455ddb5402cbb0d7df375b7595277f8b0eab9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166782435

   
   ## CI report:
   
   * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11455:
URL: https://github.com/apache/hudi/pull/11455#issuecomment-2166782750

   
   ## CI report:
   
   * 468720a4bc3c521e71e2d2e6459d37fd27f8444b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11449:
URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166782357

   
   ## CI report:
   
   * 6b21881333bd3870957950fd467189689887ec5c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24381)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7823] Simplify dependency management on exclusions [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11374:
URL: https://github.com/apache/hudi/pull/11374#issuecomment-2166781884

   
   ## CI report:
   
   * 05fab0df29530420f0a77abf46be996b70c1bc25 UNKNOWN
   * 1e6955bbac8cc18f6774360c7b3ef4e307c1c397 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24205)
 
   * 531bfafd735b515fc756f3f9fc8a6b929f8e2c88 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


VitoMakarevich commented on code in PR #11450:
URL: https://github.com/apache/hudi/pull/11450#discussion_r1638937534


##
hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java:
##
@@ -51,6 +51,10 @@ public ReadContext init(Configuration configuration, 
Map keyValu
   configuration.set(AvroWriteSupport.WRITE_OLD_LIST_STRUCTURE,
   "false", "support reading avro from non-legacy map/list in parquet 
file");
 }
+if (legacyMode) {

Review Comment:
   Yes, struggling to do this with all Hudi test rails, but slowly proceeding.
   I'm not sure exactly understand what you mean since if I have this code 
block it will anyway pass.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


VitoMakarevich commented on code in PR #11450:
URL: https://github.com/apache/hudi/pull/11450#discussion_r1638937534


##
hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java:
##
@@ -51,6 +51,10 @@ public ReadContext init(Configuration configuration, 
Map keyValu
   configuration.set(AvroWriteSupport.WRITE_OLD_LIST_STRUCTURE,
   "false", "support reading avro from non-legacy map/list in parquet 
file");
 }
+if (legacyMode) {

Review Comment:
   Yes, struggling to do this with all Hudi test rails, but slowly proceeding.
   I'm not sure exactly understand what you mean since if I have this code 
block it will anyway proceed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7747] In MetaClient remove getBasePathV2() and return StoragePath from getBasePath() [hudi]

2024-06-13 Thread via GitHub


yihua commented on code in PR #11385:
URL: https://github.com/apache/hudi/pull/11385#discussion_r1638837492


##
hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java:
##
@@ -95,7 +95,7 @@ storage, new StoragePath(logFilePathPattern)).stream()
 new HashMap<>();
 int numCorruptBlocks = 0;
 int dummyInstantTimeCount = 0;
-String basePath = 
HoodieCLI.getTableMetaClient().getBasePathV2().toString();
+String basePath = HoodieCLI.basePath;

Review Comment:
   Have you tested the changes locally with Hudi CLI so that after changing the 
base path to a different table, the Hudi CLI works properly on the new table 
(both the base path and meta client are updated)?



##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -581,23 +572,27 @@ public void validateTableProperties(Properties 
properties) {
*/
   public static HoodieTableMetaClient 
initTableAndGetMetaClient(StorageConfiguration storageConf, String basePath,
 Properties 
props) throws IOException {
+return initTableAndGetMetaClient(storageConf, new StoragePath(basePath), 
props);

Review Comment:
   A good follow-up would be removing any util methods taking `String` path and 
passing `StoragePath` instance all the way down.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


VitoMakarevich commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166774450

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11450:
URL: https://github.com/apache/hudi/pull/11450#issuecomment-2166758114

   
   ## CI report:
   
   * a13f012152b0cd41feccfd589ea438f3e1697607 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24383)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11449:
URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166757927

   
   ## CI report:
   
   * 6b21881333bd3870957950fd467189689887ec5c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11448:
URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166757804

   
   ## CI report:
   
   * 7a032b2d72443ead03c3fb39af22552db360a2ba Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24382)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11448:
URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166575297

   
   ## CI report:
   
   * 9c2caef97e9e3a94097905179e57ab98e4fe8935 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24380)
 
   * 7a032b2d72443ead03c3fb39af22552db360a2ba Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24382)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


jonvex commented on code in PR #11448:
URL: https://github.com/apache/hudi/pull/11448#discussion_r1638841328


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -256,6 +257,137 @@ public static RecordMergeMode 
getRecordMergeMode(Properties props) {
 return RecordMergeMode.valueOf(mergeMode);
   }
 
+  public static Builder builder() {
+return new Builder<>();
+  }
+
+  public static class Builder {
+
+HoodieReaderContext readerContext;
+HoodieStorage storage;
+String tablePath;
+String latestCommitTime;
+FileSlice fileSlice;
+Schema dataSchema;
+Schema requestedSchema;
+Option internalSchemaOpt;
+HoodieTableMetaClient hoodieTableMetaClient;
+TypedProperties props;
+HoodieTableConfig tableConfig;
+long start;
+long length;
+boolean shouldUseRecordPosition = false;
+long maxMemorySizeInBytes;
+String spillableMapBasePath;
+ExternalSpillableMap.DiskMapType diskMapType;
+boolean isBitCaskDiskMapCompressionEnabled;

Review Comment:
   https://github.com/apache/hudi/pull/11449
   https://github.com/apache/hudi/pull/11453
   https://github.com/apache/hudi/pull/11455
   followup prs to remove arguments



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11448:
URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166465238

   
   ## CI report:
   
   * 9c2caef97e9e3a94097905179e57ab98e4fe8935 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24380)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7876] use properties to store log spill map configs for fg reader [hudi]

2024-06-13 Thread via GitHub


jonvex opened a new pull request, #11455:
URL: https://github.com/apache/hudi/pull/11455

   ### Change Logs
   Currently there are 4 params for the fg reader that  are for spillable map 
configs. They can just be stored in the TypedProperties that is already passed 
in to the fg reader.
   
   ### Impact
   
   Easier to use the fg reader and integrate it in other parts of hudi
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


yihua commented on code in PR #11450:
URL: https://github.com/apache/hudi/pull/11450#discussion_r1638823816


##
hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java:
##
@@ -51,6 +51,10 @@ public ReadContext init(Configuration configuration, 
Map keyValu
   configuration.set(AvroWriteSupport.WRITE_OLD_LIST_STRUCTURE,
   "false", "support reading avro from non-legacy map/list in parquet 
file");
 }
+if (legacyMode) {

Review Comment:
   Is it possible to write a unit test so that the test fails before this fix 
and succeeds after the fix?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7874] Fix Hudi being able to read 2-level structure [hudi]

2024-06-13 Thread via GitHub


yihua commented on code in PR #11450:
URL: https://github.com/apache/hudi/pull/11450#discussion_r1638823816


##
hudi-hadoop-common/src/main/java/org/apache/parquet/avro/HoodieAvroReadSupport.java:
##
@@ -51,6 +51,10 @@ public ReadContext init(Configuration configuration, 
Map keyValu
   configuration.set(AvroWriteSupport.WRITE_OLD_LIST_STRUCTURE,
   "false", "support reading avro from non-legacy map/list in parquet 
file");
 }
+if (legacyMode) {

Review Comment:
   Is it possible to write a simple test so that the test fails before this fix 
and succeeds after the fix?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11449:
URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166465320

   
   ## CI report:
   
   * 6b21881333bd3870957950fd467189689887ec5c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24381)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] add `ad1happy2go` to github collaborators [hudi]

2024-06-13 Thread via GitHub


xushiyan merged PR #11447:
URL: https://github.com/apache/hudi/pull/11447


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] AWS Glue: An error occurred while calling o333.save. Failed to apply clean commit to metadata [hudi]

2024-06-13 Thread via GitHub


howardcho opened a new issue, #11454:
URL: https://github.com/apache/hudi/issues/11454

   Last night I was running multiple upsert Glue jobs against a single table 
(backfilling missing data), when I started getting this error:
   `An error occurred while calling o333.save. Failed to apply clean commit to 
metadata`
   
   Now, none of my jobs will complete successfully. I retried again this 
morning with my hourly incremental and it failed with the same error. I presume 
I accidentally had two jobs writing to the same partition, which caused some 
sort of deadlock. Could someone please assist me in getting my table back into 
a usable state?
   
   Hudi version: 0.14.0
   Config:
   ```
   {'hoodie.table.name': 'xxx', 'hoodie.datasource.write.table.type': 
'COPY_ON_WRITE', 'hoodie.datasource.write.operation': 'upsert', 
'hoodie.datasource.write.recordkey.field': 
'received_year,received_month,received_day,request_uuid', 
'hoodie.datasource.write.precombine.field': 'nats_timestamp', 
'hoodie.datasource.write.hive_style_partitioning': True, 
'hoodie.metadata.record.index.enable': False, 'hoodie.index.type': 'BLOOM', 
'hoodie.parquet.max.file.size': 536870912, 'hoodie.parquet.small.file.limit': 
104857600, 'hoodie.metadata.enable': 'true', 'hoodie.metadata.index.async': 
'false', 'hoodie.metadata.index.column.stats.enable': 'true', 
'hoodie.metadata.index.check.timeout.seconds': '60', 
'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 
'hoodie.write.lock.provider': 
'org.apache.hudi.client.transaction.lock.InProcessLockProvider', 
'hoodie.datasource.write.schema.allow.auto.evolution.column.drop': True, 
'hoodie.datasource.write.partitionpath.field': 'received_year,re
 ceived_month,received_day', 'hoodie.datasource.hive_sync.partition_fields': 
'received_year,received_month,received_day', 'hoodie.clean.automatic': 'true', 
'hoodie.clean.async': 'false', 'hoodie.cleaner.policy': 
'KEEP_LATEST_FILE_VERSIONS', 'hoodie.cleaner.fileversions.retained': '3', 
'hoodie-conf hoodie.cleaner.parallelism': '200', 
'hoodie.cleaner.commits.retained': 5, 'hoodie.parquet.compression.codec': 
'gzip', 'hoodie.datasource.hive_sync.enable': 'true', 
'hoodie.datasource.hive_sync.database': 'product_usage', 
'hoodie.datasource.hive_sync.table': 'usage', 
'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor', 
'hoodie.datasource.hive_sync.use_jdbc': 'false', 
'hoodie.datasource.hive_sync.mode': 'hms', 
'hoodie.datasource.hive_sync.support_timestamp': True, 
'hive_sync.support_timestamp': True, 
'hoodie.datasource.write.keygenerator.class': 
'org.apache.hudi.keygen.ComplexKeyGenerator'}
   ```
* Stack trace below
   
   I re-ran a single incremental job with `hoodie.metadata.enabled=False` and 
it worked, but when trying to re-run the older backfill jobs, they continue to 
fail.
   
   I then tried modifying the `hoodie-conf hoodie.cleaner.parallelism` and  
`hoodie.cleaner.commits.retained`:
   ```
   Writing Hudi 0.14.0 data with method: upsert
   {'hoodie.table.name': 'xxx', 'hoodie.datasource.write.table.type': 
'COPY_ON_WRITE', 'hoodie.datasource.write.operation': 'upsert', 
'hoodie.datasource.write.recordkey.field': 
'received_year,received_month,received_day,request_uuid', 
'hoodie.datasource.write.precombine.field': 'nats_timestamp', 
'hoodie.datasource.write.hive_style_partitioning': True, 
'hoodie.metadata.record.index.enable': False, 'hoodie.index.type': 'BLOOM', 
'hoodie.parquet.max.file.size': 536870912, 'hoodie.parquet.small.file.limit': 
104857600, 'hoodie.metadata.enable': 'true', 'hoodie.metadata.index.async': 
'false', 'hoodie.metadata.index.column.stats.enable': 'true', 
'hoodie.metadata.index.check.timeout.seconds': '60', 
'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 
'hoodie.write.lock.provider': 
'org.apache.hudi.client.transaction.lock.InProcessLockProvider', 
'hoodie.datasource.write.schema.allow.auto.evolution.column.drop': True, 
'hoodie.datasource.write.partitionpath.field': 'received_year,re
 ceived_month,received_day', 'hoodie.datasource.hive_sync.partition_fields': 
'received_year,received_month,received_day', 'hoodie.clean.automatic': 'true', 
'hoodie.clean.async': 'false', 'hoodie.cleaner.policy': 
'KEEP_LATEST_FILE_VERSIONS', 'hoodie.cleaner.fileversions.retained': '3', 
'hoodie-conf hoodie.cleaner.parallelism': 10, 
'hoodie.cleaner.commits.retained': 20, 'hoodie.parquet.compression.codec': 
'gzip', 'hoodie.datasource.hive_sync.enable': 'true', 
'hoodie.datasource.hive_sync.database': 'product_usage', 
'hoodie.datasource.hive_sync.table': 'usage', 
'hoodie.datasource.hive_sync.partition_extractor_class': 
'org.apache.hudi.hive.MultiPartKeysValueExtractor', 
'hoodie.datasource.hive_sync.use_jdbc': 'false', 
'hoodie.datasource.hive_sync.mode': 'hms', 
'hoodie.datasource.hive_sync.support_timestamp': True, 
'hive_sync.support_timestamp': True, 
'hoodie.datasource.write.keygenerator.class': 

Re: [PR] [HUDI-7875] remove tablepath from fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11453:
URL: https://github.com/apache/hudi/pull/11453#issuecomment-2166672996

   
   ## CI report:
   
   * d82a4d6fb2c3cd0aa353c028d15659427eec4962 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24386)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11452:
URL: https://github.com/apache/hudi/pull/11452#issuecomment-2166672929

   
   ## CI report:
   
   * 4c87d3db776324da5a1e6f3caacb2b55c525ff36 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24385)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7875] remove tablepath from fg reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11453:
URL: https://github.com/apache/hudi/pull/11453#issuecomment-2166659296

   
   ## CI report:
   
   * d82a4d6fb2c3cd0aa353c028d15659427eec4962 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7873] remove getStorage method from reader context [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11452:
URL: https://github.com/apache/hudi/pull/11452#issuecomment-2166659242

   
   ## CI report:
   
   * 4c87d3db776324da5a1e6f3caacb2b55c525ff36 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7872] Recreate Glue table on certain types of exceptions [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11451:
URL: https://github.com/apache/hudi/pull/11451#issuecomment-2166575530

   
   ## CI report:
   
   * 7eee3194062ffa96e5253e5acdf4fcf48fc7040c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11449:
URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166647365

   
   ## CI report:
   
   * 6b21881333bd3870957950fd467189689887ec5c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24381)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7872] Recreate Glue table on certain types of exceptions [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11451:
URL: https://github.com/apache/hudi/pull/11451#issuecomment-2166647444

   
   ## CI report:
   
   * 7eee3194062ffa96e5253e5acdf4fcf48fc7040c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24384)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7875] remove tablepath from fg reader [hudi]

2024-06-13 Thread via GitHub


jonvex opened a new pull request, #11453:
URL: https://github.com/apache/hudi/pull/11453

   ### Change Logs
   
   now we get the table path from the meta client. Also we will pass the meta 
client through the reader context to reduce the params of the record buffers.
   
   ### Impact
   
   easier for new engines to implement fg reader
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11448:
URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166447956

   
   ## CI report:
   
   * 9c2caef97e9e3a94097905179e57ab98e4fe8935 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


jonvex opened a new pull request, #11449:
URL: https://github.com/apache/hudi/pull/11449

   ### Change Logs
   
   HoodieFileGroupReader has too many params.
   We can get the tableconfig from the metaclient which is also a param. 
   
   ### Impact
   
   Less params for fg reader
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7872] Recreate Glue table on certain types of exceptions [hudi]

2024-06-13 Thread via GitHub


vamsikarnika opened a new pull request, #11451:
URL: https://github.com/apache/hudi/pull/11451

   ### Change Logs
   
   Recreate and sync glue and hive tables when there's an exception happens 
while syncing schema, properties or partitions. To enable this feature, we need 
to set below flags to true
   
   
   `hoodie.datasource.hive_sync.recreate_table_on_error=true`
   `hoodie.datasource.meta.sync.glue.recreate_table_on_error=true`
   
   ### Impact
   
   Low: When catalog sync fails for the first time, we will retry one more time 
by dropping and recreating the table, which might increase sync time.
   
   ### Risk level (write none, low medium or high below)
   
   LOW: When sync fails and we try to recreate the table, if drop table 
succeeds and fails to create the table, we might end up deleting customers 
table. We need to add an alert to catch this scenario.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7873] remove getStorage method from reader context [hudi]

2024-06-13 Thread via GitHub


jonvex opened a new pull request, #11452:
URL: https://github.com/apache/hudi/pull/11452

   ### Change Logs
   
   Remove this method because it is implemented the same for all reader 
contexts, and it is only used by a test.
   
   ### Impact
   
   Easier to implement fg reader for a new engine
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11448:
URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166558515

   
   ## CI report:
   
   * 9c2caef97e9e3a94097905179e57ab98e4fe8935 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24380)
 
   * 7a032b2d72443ead03c3fb39af22552db360a2ba Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24382)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


jonvex commented on code in PR #11449:
URL: https://github.com/apache/hudi/pull/11449#discussion_r1638737559


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -92,7 +92,6 @@ public HoodieFileGroupReader(HoodieReaderContext 
readerContext,
Option internalSchemaOpt,
HoodieTableMetaClient hoodieTableMetaClient,
TypedProperties props,
-   HoodieTableConfig tableConfig,

Review Comment:
   Yeah, I can keep getting rid of more. Just want to keep the prs small



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11448:
URL: https://github.com/apache/hudi/pull/11448#issuecomment-2166545502

   
   ## CI report:
   
   * 9c2caef97e9e3a94097905179e57ab98e4fe8935 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24380)
 
   * 7a032b2d72443ead03c3fb39af22552db360a2ba UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-3304] Add support for selective partial update [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #9979:
URL: https://github.com/apache/hudi/pull/9979#issuecomment-2166542192

   
   ## CI report:
   
   * b038e47bc8365959cc7d9a4a4d5fe07e081dd64e UNKNOWN
   * d4cd757a7318a5ff2d1b3c3b10eddbbb91b8059f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24379)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Fix Hudi being able to read 2-level structure if explicit flag for wr… [hudi]

2024-06-13 Thread via GitHub


VitoMakarevich opened a new pull request, #11450:
URL: https://github.com/apache/hudi/pull/11450

   If I have `"spark.hadoop.parquet.avro.write-old-list-structure", "false"` 
explicitly set - to being able to write nulls inside arrays(the only way), Hudi 
starts to write Parquets with the following schema inside:
   ```
  required group internal_list (LIST) {
   repeated group list {
 required int64 element;
   }
 }
   ```
   But if I had some files produced before setting 
`"spark.hadoop.parquet.avro.write-old-list-structure", "false"`, they have the 
following schema inside 
   ```
 required group internal_list (LIST) {
   repeated int64 array;
 }
   ```
   And Hudi 0.14.x at least fails to read records from such file - failing with 
exception 
   `Caused by: java.lang.RuntimeException: Null-value for required field: `
   
   Even though the contents of arrays is `not null`(it cannot be null in fact 
since Avro requires `spark.hadoop.parquet.avro.write-old-list-structure` = 
`false` to write `null`s.
   
   ### Expected behavior 
   Taken from Hudi 0.12.1(not sure what exactly broke that):
   1. If I have a file with 2 level structure and update(not matter having 
nulls inside array or not - both produce the same) arrives with 
"spark.hadoop.parquet.avro.write-old-list-structure", "false" - overwrite it 
into 3 level.(**fails in 0.14.1**)
   2. If I have 3 level structure with nulls and update cames(not matter with 
nulls or without) - read and write correctly
   
   The simple reproduction of issue can be found here:
   https://github.com/VitoMakarevich/hudi-issue-014
   
   Highly likely, the problem appeared after Hudi made some changes, so values 
from Hadoop conf started to propagate into Reader instance(likely they were not 
propagated before).
   
   ### Change Logs
   Added explicit override of 
`spark.hadoop.parquet.avro.write-old-list-structure` = `true` if file being 
read is old(has 2 level structure).
   
   
   ### Impact
   
   Running tests to ensure no unexpected issues propagating.
   
   ### Risk level (write none, low medium or high below)
   
   medium 
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


yihua commented on code in PR #11448:
URL: https://github.com/apache/hudi/pull/11448#discussion_r1638694040


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -256,6 +257,137 @@ public static RecordMergeMode 
getRecordMergeMode(Properties props) {
 return RecordMergeMode.valueOf(mergeMode);
   }
 
+  public static Builder builder() {
+return new Builder<>();
+  }
+
+  public static class Builder {
+
+HoodieReaderContext readerContext;
+HoodieStorage storage;
+String tablePath;
+String latestCommitTime;
+FileSlice fileSlice;
+Schema dataSchema;
+Schema requestedSchema;
+Option internalSchemaOpt;
+HoodieTableMetaClient hoodieTableMetaClient;
+TypedProperties props;
+HoodieTableConfig tableConfig;
+long start;
+long length;
+boolean shouldUseRecordPosition = false;
+long maxMemorySizeInBytes;
+String spillableMapBasePath;
+ExternalSpillableMap.DiskMapType diskMapType;
+boolean isBitCaskDiskMapCompressionEnabled;
+
+public Builder withReaderContext(HoodieReaderContext readerContext) {
+  this.readerContext = readerContext;
+  return this;
+}
+
+public Builder withHoodieStorage(HoodieStorage storage) {
+  this.storage = storage;
+  return this;
+}
+
+public Builder withTablePath(String tablePath) {
+  this.tablePath = tablePath;
+  return this;
+}
+
+public Builder withLatestCommitTime(String latestCommitTime) {
+  this.latestCommitTime = latestCommitTime;
+  return this;
+}
+
+public Builder withFileSlice(FileSlice fileSlice) {
+  this.fileSlice = fileSlice;
+  return this;
+}
+
+public Builder withDataSchema(Schema dataSchema) {
+  this.dataSchema = dataSchema;
+  return this;
+}
+
+public Builder withRequestedSchema(Schema requestedSchema) {
+  this.requestedSchema = requestedSchema;
+  return this;
+}
+
+public Builder withInternalSchemaOpt(Option 
internalSchemaOpt) {
+  this.internalSchemaOpt = internalSchemaOpt;
+  return this;
+}
+
+public Builder withMetaClient(HoodieTableMetaClient 
hoodieTableMetaClient) {
+  this.hoodieTableMetaClient = hoodieTableMetaClient;
+  return this;
+}
+
+public Builder withTypedProperties(TypedProperties props) {
+  this.props = props;
+  return this;
+}
+
+public Builder withTableConfig(HoodieTableConfig tableConfig) {
+  this.tableConfig = tableConfig;
+  return this;
+}
+
+public Builder withStart(long start) {
+  this.start = start;
+  return this;
+}
+
+public Builder withLength(long length) {
+  this.length = length;
+  return this;
+}
+
+public Builder withUseRecordPosition(boolean shouldUseRecordPosition) {
+  this.shouldUseRecordPosition = shouldUseRecordPosition;
+  return this;
+}
+
+public Builder withMaxMemorySizeInBytes(long maxMemorySizeInBytes) {
+  this.maxMemorySizeInBytes = maxMemorySizeInBytes;
+  return this;
+}
+
+public Builder withSpillableMapBasePath(String spillableMapBasePath) {

Review Comment:
   Could you simplify the number of configs in the builder assuming that the 
user can use the file group reader API without having deep knowledge on Hudi?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


yihua commented on code in PR #11449:
URL: https://github.com/apache/hudi/pull/11449#discussion_r1638685103


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -92,7 +92,6 @@ public HoodieFileGroupReader(HoodieReaderContext 
readerContext,
Option internalSchemaOpt,
HoodieTableMetaClient hoodieTableMetaClient,
TypedProperties props,
-   HoodieTableConfig tableConfig,

Review Comment:
   Could you think about simplifying all the parameters?  The goal should be 
that given the file group ID and the query type with the reader context, 
storage instance, and the minimal set of configs (maybe with meta client), the 
file group reader should be able to figure out all necessary configs to fill in 
for reading the records out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-3304] Add support for selective partial update [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #9979:
URL: https://github.com/apache/hudi/pull/9979#issuecomment-2166461134

   
   ## CI report:
   
   * b038e47bc8365959cc7d9a4a4d5fe07e081dd64e UNKNOWN
   * 5ea3b0b905186b2701ee57f466cbec82043ddbea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23601)
 
   * d4cd757a7318a5ff2d1b3c3b10eddbbb91b8059f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24379)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


codope commented on code in PR #11448:
URL: https://github.com/apache/hudi/pull/11448#discussion_r1638668531


##
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java:
##
@@ -280,25 +279,24 @@ private void 
validateOutputFromFileGroupReader(StorageConfiguration storageCo
   props.setProperty(PARTITION_FIELDS.key(), 
metaClient.getTableConfig().getString(PARTITION_FIELDS));
 }
 assertEquals(containsBaseFile, fileSlice.getBaseFile().isPresent());
-HoodieFileGroupReader fileGroupReader = new HoodieFileGroupReader<>(
-getHoodieReaderContext(tablePath, avroSchema, storageConf),
-metaClient.getStorage(),
-tablePath,
-metaClient.getActiveTimeline().lastInstant().get().getTimestamp(),
-fileSlice,
-avroSchema,
-avroSchema,
-Option.empty(),
-metaClient,
-props,
-metaClient.getTableConfig(),
-0,
-fileSlice.getTotalFileSize(),
-false,
-1024 * 1024 * 1000,
-metaClient.getTempFolderPath(),
-ExternalSpillableMap.DiskMapType.ROCKS_DB,
-false);
+HoodieFileGroupReader fileGroupReader = HoodieFileGroupReader.builder()
+.withReaderContext(getHoodieReaderContext(tablePath, avroSchema, 
storageConf))
+.withHoodieStorage(metaClient.getStorage())
+.withTablePath(tablePath)
+
.withLatestCommitTime(metaClient.getActiveTimeline().lastInstant().get().getTimestamp())
+.withFileSlice(fileSlice)
+.withDataSchema(avroSchema)
+.withRequestedSchema(avroSchema)
+.withMetaClient(metaClient)
+.withTypedProperties(props)
+.withTableConfig(metaClient.getTableConfig())
+.withStart(0)
+.withLength(fileSlice.getTotalFileSize())
+.withMaxMemorySizeInBytes(1024 * 1024 * 1000)

Review Comment:
   Also, consider making the size configurable instead of hard coded value. Can 
do it in a followup pr.



##
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java:
##
@@ -280,25 +279,24 @@ private void 
validateOutputFromFileGroupReader(StorageConfiguration storageCo
   props.setProperty(PARTITION_FIELDS.key(), 
metaClient.getTableConfig().getString(PARTITION_FIELDS));
 }
 assertEquals(containsBaseFile, fileSlice.getBaseFile().isPresent());
-HoodieFileGroupReader fileGroupReader = new HoodieFileGroupReader<>(
-getHoodieReaderContext(tablePath, avroSchema, storageConf),
-metaClient.getStorage(),
-tablePath,
-metaClient.getActiveTimeline().lastInstant().get().getTimestamp(),
-fileSlice,
-avroSchema,
-avroSchema,
-Option.empty(),
-metaClient,
-props,
-metaClient.getTableConfig(),
-0,
-fileSlice.getTotalFileSize(),
-false,
-1024 * 1024 * 1000,
-metaClient.getTempFolderPath(),
-ExternalSpillableMap.DiskMapType.ROCKS_DB,
-false);
+HoodieFileGroupReader fileGroupReader = HoodieFileGroupReader.builder()
+.withReaderContext(getHoodieReaderContext(tablePath, avroSchema, 
storageConf))
+.withHoodieStorage(metaClient.getStorage())
+.withTablePath(tablePath)
+
.withLatestCommitTime(metaClient.getActiveTimeline().lastInstant().get().getTimestamp())
+.withFileSlice(fileSlice)
+.withDataSchema(avroSchema)
+.withRequestedSchema(avroSchema)
+.withMetaClient(metaClient)
+.withTypedProperties(props)
+.withTableConfig(metaClient.getTableConfig())
+.withStart(0)
+.withLength(fileSlice.getTotalFileSize())
+.withMaxMemorySizeInBytes(1024 * 1024 * 1000)

Review Comment:
   While we're at refactoring, can we also declare this size as constant at 
some common place.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7871] remove tableconfig from filegroup reader params [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11449:
URL: https://github.com/apache/hudi/pull/11449#issuecomment-2166448035

   
   ## CI report:
   
   * 6b21881333bd3870957950fd467189689887ec5c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-3304] Add support for selective partial update [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #9979:
URL: https://github.com/apache/hudi/pull/9979#issuecomment-2166429555

   
   ## CI report:
   
   * b038e47bc8365959cc7d9a4a4d5fe07e081dd64e UNKNOWN
   * 5ea3b0b905186b2701ee57f466cbec82043ddbea Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23601)
 
   * d4cd757a7318a5ff2d1b3c3b10eddbbb91b8059f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] URI too long error [hudi]

2024-06-13 Thread via GitHub


michael1991 commented on issue #11446:
URL: https://github.com/apache/hudi/issues/11446#issuecomment-2166133152

   Hi @ad1happy2go , glad to see you again ~
   Can you try column name with underscore, i'm not sure if enable urlencode 
for partition and partition column name with underscore could make this happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7386] Add Builder to FileGroup Reader [hudi]

2024-06-13 Thread via GitHub


jonvex opened a new pull request, #11448:
URL: https://github.com/apache/hudi/pull/11448

   ### Change Logs
   
   Number of constructor params is getting too long for the fg reader. Use 
builder style instead.
   
   ### Impact
   
   Easier to keep track of params for fg reader 
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7869] Ensure properties are copied when modifying schema [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11441:
URL: https://github.com/apache/hudi/pull/11441#issuecomment-2165889684

   
   ## CI report:
   
   * 80572ef48e06b8c794e53c5db94aebc95c23c34d UNKNOWN
   * 72b491d8467da51aaa5e840631027a98ddb4cf93 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24375)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] URI too long error [hudi]

2024-06-13 Thread via GitHub


ad1happy2go commented on issue #11446:
URL: https://github.com/apache/hudi/issues/11446#issuecomment-2166095081

   @michael1991 Thanks for raising this. Can you help me to reproduce this 
issue. I tried below but it was working fine for me.
   
   ```
   fake = Faker()
   data = [{"ID": fake.uuid4(), "EventTime": "2023-03-04 14:44:42.046661",
"FullName": fake.name(), "Address": fake.address(),
"CompanyName": fake.company(), "JobTitle": fake.job(),
"EmailAddress": fake.email(), "PhoneNumber": fake.phone_number(),
"RandomText": fake.sentence(), "CityNameDummyBigFieldName": 
fake.city(),  "ts":"1",
"StateNameDummyBigFieldName": fake.state(), "Country": 
fake.country()} for _ in range(1000)]
   pandas_df = pd.DataFrame(data)
   
   hoodie_properties = {
   'hoodie.datasource.write.table.type': 'COPY_ON_WRITE',
   'hoodie.datasource.write.operation': 'upsert',
   'hoodie.datasource.write.hive_style_partitioning': 'true',
   'hoodie.datasource.write.recordkey.field': 'ID',
   'hoodie.datasource.write.partitionpath.field': 
'StateNameDummyBigFieldName,CityNameDummyBigFieldName',
   'hoodie.table.name' : 'test'
   
   }
   spark.sparkContext.setLogLevel("WARN")
   df = spark.createDataFrame(pandas_df)
   
df.write.format("hudi").options(**hoodie_properties).mode("overwrite").save(PATH)
   
   for i in range(1, 50):
   
df.write.format("hudi").options(**hoodie_properties).mode("append").save(PATH)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7869] Ensure properties are copied when modifying schema [hudi]

2024-06-13 Thread via GitHub


jonvex merged PR #11441:
URL: https://github.com/apache/hudi/pull/11441


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11445:
URL: https://github.com/apache/hudi/pull/11445#issuecomment-2166057193

   
   ## CI report:
   
   * a46c941e943db5ce71526fde45b8cd8029257b94 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24376)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] project hudi-common: Compilation failure: Compilation failure [hudi]

2024-06-13 Thread via GitHub


HuangZhenQiu commented on issue #9744:
URL: https://github.com/apache/hudi/issues/9744#issuecomment-2165894999

   The issue is highly probably due to the maven points to different java 
version. Please make sure maven points the JDK 1.8
   
   
   % mvn -version
   Apache Maven 3.9.7 (8b094c9513efc1b9ce2d952b3b9c8eaedaf8cbf0)
   Maven home: /opt/homebrew/Cellar/maven/3.9.7/libexec
   Java version: 1.8.0_202, vendor: Oracle Corporation, runtime: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home/jre
   Default locale: en_US, platform encoding: UTF-8
   OS name: "mac os x", version: "10.16", arch: "x86_64", family: "mac"
   hpeter@Zhenqius-MBP ~ % 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] URI too long error [hudi]

2024-06-13 Thread via GitHub


michael1991 opened a new issue, #11446:
URL: https://github.com/apache/hudi/issues/11446

   **Describe the problem you faced**
   
   I'm using Spark3.5 + Hudi0.15.0 for partitioned table, when I choose 
req_date and req_hour for partition column name, I will get this error, but 
task would be executed successfully finally;
   when I choose date and hour for partition column name, error disappeared.
   
   **Expected behavior**
   
   We should get no errors when we just make partition column names a bit 
longer.
   
   **Environment Description**
   
   * Hudi version : 0.15.0
   
   * Spark version : 3.5.0
   
   * Hive version : NA
   
   * Hadoop version : 3.3.6
   
   * Storage (HDFS/S3/GCS..) : GCS
   
   * Running on Docker? (yes/no) : no
   
   **Stacktrace**
   
   ```
   2024-06-13 13:21:13 ERROR PriorityBasedFileSystemView:129 - Got error 
running preferred function. Trying secondary
   org.apache.hudi.exception.HoodieRemoteException: URI Too Long
at 
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.loadPartitions(RemoteHoodieTableFileSystemView.java:447)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.loadPartitions(RemoteHoodieTableFileSystemView.java:465)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.lambda$loadPartitions$6e5c444d$1(PriorityBasedFileSystemView.java:187)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.execute(PriorityBasedFileSystemView.java:69)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.common.table.view.PriorityBasedFileSystemView.loadPartitions(PriorityBasedFileSystemView.java:185)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:133)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.requestClean(CleanPlanActionExecutor.java:174)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.table.action.clean.CleanPlanActionExecutor.execute(CleanPlanActionExecutor.java:200)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.table.HoodieSparkCopyOnWriteTable.scheduleCleaning(HoodieSparkCopyOnWriteTable.java:212)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.scheduleTableServiceInternal(BaseHoodieTableServiceClient.java:647)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:746)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:843)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:816)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:847)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.client.BaseHoodieWriteClient.autoCleanOnCommit(BaseHoodieWriteClient.java:581)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:560)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:251)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:108) 
~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:1082)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:508)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:187)
 ~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:125) 
~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:168) 
~[hudi-spark3.5-bundle_2.12-0.15.0.jar:0.15.0]
at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
 ~[spark-sql_2.12-3.5.0.jar:0.15.0]
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
 ~[spark-sql_2.12-3.5.0.jar:3.5.0]
at 

Re: [I] [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit [hudi]

2024-06-13 Thread via GitHub


nsivabalan commented on issue #11419:
URL: https://github.com/apache/hudi/issues/11419#issuecomment-2165786565

   thanks @beyond1920 . please put out a patch. I would like to review as well. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11445:
URL: https://github.com/apache/hudi/pull/11445#issuecomment-2165769508

   
   ## CI report:
   
   * a46c941e943db5ce71526fde45b8cd8029257b94 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24376)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11445:
URL: https://github.com/apache/hudi/pull/11445#issuecomment-2165744949

   
   ## CI report:
   
   * a46c941e943db5ce71526fde45b8cd8029257b94 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7869] Ensure properties are copied when modifying schema [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11441:
URL: https://github.com/apache/hudi/pull/11441#issuecomment-2165723395

   
   ## CI report:
   
   * 80572ef48e06b8c794e53c5db94aebc95c23c34d UNKNOWN
   * 72b491d8467da51aaa5e840631027a98ddb4cf93 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24375)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]

2024-06-13 Thread via GitHub


beyond1920 commented on PR #11445:
URL: https://github.com/apache/hudi/pull/11445#issuecomment-2165666407

   It's a draft yet, would add some test case soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7867] Ensuring delete invalid all files during finalizing the writer [hudi]

2024-06-13 Thread via GitHub


beyond1920 opened a new pull request, #11445:
URL: https://github.com/apache/hudi/pull/11445

   ### Change Logs
   
   We should not skip delete invalid files during finalizing the writer.  If 
files deletion failed, users might get wrong result.
   The pr aims to fix the 
[issue#11419](https://github.com/apache/hudi/issues/11419).
   
   ### Impact
   
   Fix the bug when delete invalid files.
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7869] Ensure properties are copied when modifying schema [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11441:
URL: https://github.com/apache/hudi/pull/11441#issuecomment-2165606342

   
   ## CI report:
   
   * 80572ef48e06b8c794e53c5db94aebc95c23c34d UNKNOWN
   * 72b491d8467da51aaa5e840631027a98ddb4cf93 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7847] Infer record merge mode during table upgrade [hudi]

2024-06-13 Thread via GitHub


hudi-bot commented on PR #11439:
URL: https://github.com/apache/hudi/pull/11439#issuecomment-2165553658

   
   ## CI report:
   
   * 6fb054e35d656d4cbcf732b1ee4be13bb122b57a Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=24374)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7847] Infer record merge mode during table upgrade [hudi]

2024-06-13 Thread via GitHub


geserdugarov commented on PR #11439:
URL: https://github.com/apache/hudi/pull/11439#issuecomment-2165293826

   > I found new MR that could help to make end-to-end test. I will try to 
cherry-pick #11443 and make full read by Hudi 1.0 beta of a table initially 
written by Hudi 0.14.
   
   Tried changes from #11443, and got:
   `org.apache.hudi.exception.HoodieIOException: Could not read commit details 
from 
file:/tmp/MOR_event_time/.hoodie/20240612164545738_1718275031205.deltacommit`
   
   I suppose with changes on task 
[HUDI-7857](https://issues.apache.org/jira/browse/HUDI-7857), it could be 
possible to read the table (with metadata table turned off).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7838] Use Config hoodie.schema.cache.enable in HoodieBaseFileGroupRecordBuffer and AbstractHoodieLogRecordReader [hudi]

2024-06-13 Thread via GitHub


wombatu-kun commented on code in PR #11444:
URL: https://github.com/apache/hudi/pull/11444#discussion_r1637989406


##
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java:
##
@@ -72,4 +72,10 @@ public class HoodieReaderConfig extends HoodieConfig {
   .markAdvanced()
   .sinceVersion("1.0.0")
   .withDocumentation("Whether to use positions in the block header for 
data blocks containing updates and delete blocks for merging.");
+
+  public static final ConfigProperty ENABLE_INTERNAL_SCHEMA_CACHE = 
ConfigProperty
+  .key("hoodie.schema.cache.enable")
+  .defaultValue(false)

Review Comment:
   Do you want me to change the defaultValue of this config to `true`, or to 
remove this config property completely and use internal schema cache everywhere 
unconditionally?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   4   5   6   7   8   9   10   >