[GitHub] [hudi] hudi-bot commented on pull request #8598: [HUDI-6143] use the startoffset of each logfile when the downstream tasks read the logfile

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8598: URL: https://github.com/apache/hudi/pull/8598#issuecomment-1534130948 ## CI report: * c16f375a644f5417d9f90883bcbe5f377095 Azure:

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8611: [HUDI-6157] Fix order of commits served with flink streaming source from table with multi writer

2023-05-03 Thread via GitHub
SteNicholas commented on code in PR #8611: URL: https://github.com/apache/hudi/pull/8611#discussion_r1184569380 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/configuration/OptionsResolver.java: ## @@ -260,6 +260,17 @@ public static boolean

[jira] [Created] (HUDI-6168) Add source partition columns to rows in S3/GCS Sources

2023-05-03 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6168: --- Summary: Add source partition columns to rows in S3/GCS Sources Key: HUDI-6168 URL: https://issues.apache.org/jira/browse/HUDI-6168 Project: Apache Hudi Issue

[jira] [Assigned] (HUDI-6168) Add source partition columns to rows in S3/GCS Sources

2023-05-03 Thread Timothy Brown (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown reassigned HUDI-6168: --- Assignee: Timothy Brown > Add source partition columns to rows in S3/GCS Sources >

[GitHub] [hudi] xushiyan commented on a diff in pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-05-03 Thread via GitHub
xushiyan commented on code in PR #8490: URL: https://github.com/apache/hudi/pull/8490#discussion_r1184560033 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSourceStorage.scala: ## @@ -133,4 +132,69 @@ class TestMORDataSourceStorage

[hudi] branch asf-site updated: updated community content (#8621)

2023-05-03 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new f654e4bb250 updated community content

[GitHub] [hudi] bhasudha merged pull request #8621: updated community content

2023-05-03 Thread via GitHub
bhasudha merged PR #8621: URL: https://github.com/apache/hudi/pull/8621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] danny0405 commented on issue #7602: [SUPPORT] When does the Spark engine's bulk insert mode support bucket index

2023-05-03 Thread via GitHub
danny0405 commented on issue #7602: URL: https://github.com/apache/hudi/issues/7602#issuecomment-1534112611 Okay, I guess it is also feasible to add that support for consistent hasing index. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] danny0405 commented on pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With StringBuilder

2023-05-03 Thread via GitHub
danny0405 commented on PR #8478: URL: https://github.com/apache/hudi/pull/8478#issuecomment-1534111844 > > I saw some sub-claused locations are changed, like the `LOCATION` and `CLUSTETERED BY`, is that as expected? > > @danny0405 Thank you for your thorough review! The modification

[jira] [Created] (HUDI-6167) Automatically schema inferrence for delta stream with JSON document datasource

2023-05-03 Thread Danny Chen (Jira)
Danny Chen created HUDI-6167: Summary: Automatically schema inferrence for delta stream with JSON document datasource Key: HUDI-6167 URL: https://issues.apache.org/jira/browse/HUDI-6167 Project: Apache

[GitHub] [hudi] danny0405 commented on issue #8626: InferSchema not working with JSON

2023-05-03 Thread via GitHub
danny0405 commented on issue #8626: URL: https://github.com/apache/hudi/issues/8626#issuecomment-1534109380 Sounds like a feature enquiry, it is feasible for automically JSON based schema evolution, especially for document data source, I have created a JIRA issue:

[GitHub] [hudi] JoshuaZhuCN commented on issue #7602: [SUPPORT] When does the Spark engine's bulk insert mode support bucket index

2023-05-03 Thread via GitHub
JoshuaZhuCN commented on issue #7602: URL: https://github.com/apache/hudi/issues/7602#issuecomment-1534108727 > Is this the fix you want? #7834 @danny0405 This PR addresses bulk insert support under SIMPLE BUCKET, but my usage scenarios are all CONSISTENT_ HASHING BUCKET index table,

[GitHub] [hudi] danny0405 commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-03 Thread via GitHub
danny0405 commented on code in PR #8472: URL: https://github.com/apache/hudi/pull/8472#discussion_r1184550750 ## hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

[GitHub] [hudi] Amar1404 commented on issue #8626: InferSchema not working with JSON

2023-05-03 Thread via GitHub
Amar1404 commented on issue #8626: URL: https://github.com/apache/hudi/issues/8626#issuecomment-1534104878 hi @danny0405 - I mean if i don't provide any schema class, so while using spark.read.json() is will automatically infer the schema, but in the code we are using

[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

2023-05-03 Thread via GitHub
danny0405 commented on code in PR #8190: URL: https://github.com/apache/hudi/pull/8190#discussion_r1184548905 ## hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java: ## @@ -106,9 +106,9 @@ private List getPartitionPathWithPathPrefix(String

[GitHub] [hudi] hudi-bot commented on pull request #8611: [HUDI-6157] Fix order of commits served with flink streaming source from table with multi writer

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8611: URL: https://github.com/apache/hudi/pull/8611#issuecomment-1534103080 ## CI report: * 8a10affd53d66b88abd116587e6dd5e0c43e542a Azure:

[GitHub] [hudi] danny0405 commented on a diff in pull request #8596: [BUG-FIX] use try with resource to close stream

2023-05-03 Thread via GitHub
danny0405 commented on code in PR #8596: URL: https://github.com/apache/hudi/pull/8596#discussion_r1184546697 ## hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java: ## @@ -234,6 +234,30 @@ public void testOverwriteHoodieProperties() throws IOException

[GitHub] [hudi] danny0405 commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-05-03 Thread via GitHub
danny0405 commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1534100699 > TestNestedSchemaPruningOptimization failed only in spark3.3.2 Yes, I even try the Spark 3.2.1 and it works fine. -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] danny0405 commented on issue #8628: [SUPPORT] Why the partitionpath field act somewhat similar to another recordkey(primary key)?

2023-05-03 Thread via GitHub
danny0405 commented on issue #8628: URL: https://github.com/apache/hudi/issues/8628#issuecomment-1534099485 Is your table partitioned as expected? Like you are using the BloomFilter index by default which takes deduplication per-partition scope. -- This is an automated message from the

[GitHub] [hudi] hudi-bot commented on pull request #8611: [HUDI-6157] Fix order of commits served with flink streaming source from table with multi writer

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8611: URL: https://github.com/apache/hudi/pull/8611#issuecomment-1534098133 ## CI report: * 8a10affd53d66b88abd116587e6dd5e0c43e542a Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8595: URL: https://github.com/apache/hudi/pull/8595#issuecomment-1534093550 ## CI report: * 21e3090d2bd0eb714322e3ffad7b3554f4440829 UNKNOWN * ae38bcb32800fe4f6a14ee1e627607296041c10a Azure:

[GitHub] [hudi] xushiyan commented on pull request #8390: [HUDI-5315] Use sample writes to estimate record size

2023-05-03 Thread via GitHub
xushiyan commented on PR #8390: URL: https://github.com/apache/hudi/pull/8390#issuecomment-1534075473 > lets move the sample writes call as early as possible. so we construct the writeConfig w/ the avg record size over-ridden if need be. we don't want to mutate the write config.

[GitHub] [hudi] xushiyan commented on a diff in pull request #8390: [HUDI-5315] Use sample writes to estimate record size

2023-05-03 Thread via GitHub
xushiyan commented on code in PR #8390: URL: https://github.com/apache/hudi/pull/8390#discussion_r1184523790 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkSampleWritesUtils.java: ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] hudi-bot commented on pull request #7826: [HUDI-5675] fix lazy clean schedule rollback on completed instant

2023-05-03 Thread via GitHub
hudi-bot commented on PR #7826: URL: https://github.com/apache/hudi/pull/7826#issuecomment-1534073143 ## CI report: * b74d73f66e53a4cbd6b6048c4d07e19c1b9ad566 Azure:

[GitHub] [hudi] xushiyan commented on a diff in pull request #8390: [HUDI-5315] Use sample writes to estimate record size

2023-05-03 Thread via GitHub
xushiyan commented on code in PR #8390: URL: https://github.com/apache/hudi/pull/8390#discussion_r1184522344 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkSampleWritesUtils.java: ## @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] hudi-bot commented on pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8490: URL: https://github.com/apache/hudi/pull/8490#issuecomment-1534067199 ## CI report: * 7575e66d6a48d702fe1e8d4670cb0890b370e94b Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7826: [HUDI-5675] fix lazy clean schedule rollback on completed instant

2023-05-03 Thread via GitHub
hudi-bot commented on PR #7826: URL: https://github.com/apache/hudi/pull/7826#issuecomment-1534066550 ## CI report: * b74d73f66e53a4cbd6b6048c4d07e19c1b9ad566 Azure:

[GitHub] [hudi] stream2000 commented on pull request #7826: [HUDI-5675] fix lazy clean schedule rollback on completed instant

2023-05-03 Thread via GitHub
stream2000 commented on PR #7826: URL: https://github.com/apache/hudi/pull/7826#issuecomment-1534064726 > a related PR #7469 The iusse that this PR is trying to solve not only happens in multi-writer scenarios but also in single writer with async lazy clean -- This is an

[GitHub] [hudi] stream2000 commented on a diff in pull request #7826: [HUDI-5675] fix lazy clean schedule rollback on completed instant

2023-05-03 Thread via GitHub
stream2000 commented on code in PR #7826: URL: https://github.com/apache/hudi/pull/7826#discussion_r1184519230 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java: ## @@ -707,20 +709,34 @@ protected List

[GitHub] [hudi] hudi-bot commented on pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8490: URL: https://github.com/apache/hudi/pull/8490#issuecomment-1534062678 ## CI report: * 7575e66d6a48d702fe1e8d4670cb0890b370e94b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] tomyanth opened a new issue, #8628: [SUPPORT] Why the partitionpath field act somewhat similar to another recordkey(primary key)?

2023-05-03 Thread via GitHub
tomyanth opened a new issue, #8628: URL: https://github.com/apache/hudi/issues/8628 **Describe the problem you faced** The partitionpath field act somewhat similar to another recordkey(primary key)

[GitHub] [hudi] LinMingQiang commented on a diff in pull request #7469: [HUDI-5386] Cleaning conflicts when write concurrency mode is OCC

2023-05-03 Thread via GitHub
LinMingQiang commented on code in PR #7469: URL: https://github.com/apache/hudi/pull/7469#discussion_r1184510754 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java: ## @@ -897,28 +897,40 @@ public HoodieCleanMetadata clean(String

[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

2023-05-03 Thread via GitHub
CTTY commented on code in PR #8190: URL: https://github.com/apache/hudi/pull/8190#discussion_r1184505252 ## hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java: ## @@ -106,9 +106,9 @@ private List getPartitionPathWithPathPrefix(String

[GitHub] [hudi] boneanxs commented on a diff in pull request #8452: [HUDI-6077] Add more partition push down filters

2023-05-03 Thread via GitHub
boneanxs commented on code in PR #8452: URL: https://github.com/apache/hudi/pull/8452#discussion_r1184492603 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/storage/row/TestHoodieRowCreateHandle.java: ## @@ -190,16 +189,8 @@ public void

[GitHub] [hudi] hudi-bot commented on pull request #8598: [HUDI-6143] use the startoffset of each logfile when the downstream tasks read the logfile

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8598: URL: https://github.com/apache/hudi/pull/8598#issuecomment-1534028913 ## CI report: * 6f5685dd6a464ce37a213b80c5ded5151a2710e5 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8598: [HUDI-6143] use the startoffset of each logfile when the downstream tasks read the logfile

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8598: URL: https://github.com/apache/hudi/pull/8598#issuecomment-1534021989 ## CI report: * 6f5685dd6a464ce37a213b80c5ded5151a2710e5 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8595: URL: https://github.com/apache/hudi/pull/8595#issuecomment-1534021944 ## CI report: * 38e644e9bd5d2dfb345dcbab6b6d4946f5124988 Azure:

[GitHub] [hudi] vinothchandar commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-03 Thread via GitHub
vinothchandar commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1534017088 @prashantwason @nbalajee @suryaprasanna would this break you all in anyway? Do we need the record data anywhere for successful writes? cc @rmahindra123 as well. same question.

[GitHub] [hudi] c-f-cooper commented on a diff in pull request #8596: [BUG-FIX] use try with resource to close stream

2023-05-03 Thread via GitHub
c-f-cooper commented on code in PR #8596: URL: https://github.com/apache/hudi/pull/8596#discussion_r1184493448 ## hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java: ## @@ -234,6 +234,30 @@ public void testOverwriteHoodieProperties() throws IOException

[GitHub] [hudi] hudi-bot commented on pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8595: URL: https://github.com/apache/hudi/pull/8595#issuecomment-1534013742 ## CI report: * 38e644e9bd5d2dfb345dcbab6b6d4946f5124988 Azure:

[GitHub] [hudi] danny0405 commented on a diff in pull request #8596: [BUG-FIX] use try with resource to close stream

2023-05-03 Thread via GitHub
danny0405 commented on code in PR #8596: URL: https://github.com/apache/hudi/pull/8596#discussion_r1184491794 ## hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java: ## @@ -234,6 +234,30 @@ public void testOverwriteHoodieProperties() throws IOException

[GitHub] [hudi] boneanxs commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-05-03 Thread via GitHub
boneanxs commented on PR #8452: URL: https://github.com/apache/hudi/pull/8452#issuecomment-1534009597 > Caused by: org.eclipse.aether.resolution.ArtifactDescriptorException: Failed to read artifact descriptor for org.apache.maven:maven-plugin-api:jar:3.8.6 @bvaradar Hey, it Looks an

[GitHub] [hudi] boneanxs commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-05-03 Thread via GitHub
boneanxs commented on code in PR #7627: URL: https://github.com/apache/hudi/pull/7627#discussion_r1184488167 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/streaming/HoodieStreamSource.scala: ## @@ -163,10 +178,7 @@ class HoodieStreamSource(

[GitHub] [hudi] xiarixiaoyao commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-05-03 Thread via GitHub
xiarixiaoyao commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1533999490 > BaseFileOnlyRelation.scala the reason why we hard-coded is that: The parent class of baseFileOnlyRelation is hard-coded to disable vectorization by default, resulting in the

[GitHub] [hudi] hudi-bot commented on pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8595: URL: https://github.com/apache/hudi/pull/8595#issuecomment-1533985956 ## CI report: * 38e644e9bd5d2dfb345dcbab6b6d4946f5124988 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8303: URL: https://github.com/apache/hudi/pull/8303#issuecomment-1533985556 ## CI report: * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN * e4144fb95b764a96f71b125bd02fd62bac9f00ba Azure:

[GitHub] [hudi] PaddyMelody commented on a diff in pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
PaddyMelody commented on code in PR #8595: URL: https://github.com/apache/hudi/pull/8595#discussion_r1184474743 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java: ## @@ -66,15 +66,19 @@ public class FileIndex { private final RowType

[GitHub] [hudi] c-f-cooper commented on a diff in pull request #8596: [BUG-FIX] use try with resource to close stream

2023-05-03 Thread via GitHub
c-f-cooper commented on code in PR #8596: URL: https://github.com/apache/hudi/pull/8596#discussion_r1184474302 ## hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java: ## @@ -234,6 +234,30 @@ public void testOverwriteHoodieProperties() throws IOException

[GitHub] [hudi] danny0405 commented on a diff in pull request #8596: [BUG-FIX] use try with resource to close stream

2023-05-03 Thread via GitHub
danny0405 commented on code in PR #8596: URL: https://github.com/apache/hudi/pull/8596#discussion_r1184472771 ## hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java: ## @@ -234,6 +234,30 @@ public void testOverwriteHoodieProperties() throws IOException

[GitHub] [hudi] danny0405 commented on a diff in pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
danny0405 commented on code in PR #8595: URL: https://github.com/apache/hudi/pull/8595#discussion_r1184472506 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java: ## @@ -66,15 +66,19 @@ public class FileIndex { private final RowType

[GitHub] [hudi] PaddyMelody commented on a diff in pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
PaddyMelody commented on code in PR #8595: URL: https://github.com/apache/hudi/pull/8595#discussion_r1184472150 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java: ## @@ -66,15 +66,19 @@ public class FileIndex { private final RowType

[GitHub] [hudi] danny0405 commented on issue #8626: InferSchema not working with JSON

2023-05-03 Thread via GitHub
danny0405 commented on issue #8626: URL: https://github.com/apache/hudi/issues/8626#issuecomment-1533979912 What are you indicating for `InferSchema` ? Is is a builtin funcionality for DeltaStreamer? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] danny0405 commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

2023-05-03 Thread via GitHub
danny0405 commented on code in PR #8190: URL: https://github.com/apache/hudi/pull/8190#discussion_r1184471426 ## hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java: ## @@ -106,9 +106,9 @@ private List getPartitionPathWithPathPrefix(String

[jira] [Commented] (HUDI-6111) Build hudi submodule cause checkstyle not found error

2023-05-03 Thread Ran Tao (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719114#comment-17719114 ] Ran Tao commented on HUDI-6111: --- hi. [~guoyihua]  my cmd is "mvn clean package -DskipTests". it works in

[GitHub] [hudi] PaddyMelody commented on a diff in pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
PaddyMelody commented on code in PR #8595: URL: https://github.com/apache/hudi/pull/8595#discussion_r1184470886 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java: ## @@ -66,15 +66,19 @@ public class FileIndex { private final RowType

[GitHub] [hudi] danny0405 commented on issue #8617: [SUPPORT] MapType support in HUDI

2023-05-03 Thread via GitHub
danny0405 commented on issue #8617: URL: https://github.com/apache/hudi/issues/8617#issuecomment-1533976325 Because `HoodieRecord` sereliaze the inputs into avro bytes in general, even though the output file is Parquet -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] danny0405 commented on a diff in pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
danny0405 commented on code in PR #8595: URL: https://github.com/apache/hudi/pull/8595#discussion_r1184469395 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java: ## @@ -66,15 +66,19 @@ public class FileIndex { private final RowType

[GitHub] [hudi] c-f-cooper commented on a diff in pull request #8596: [BUG-FIX] use try with resource to close stream

2023-05-03 Thread via GitHub
c-f-cooper commented on code in PR #8596: URL: https://github.com/apache/hudi/pull/8596#discussion_r1184469314 ## hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java: ## @@ -234,6 +234,30 @@ public void testOverwriteHoodieProperties() throws IOException

[GitHub] [hudi] PaddyMelody commented on a diff in pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
PaddyMelody commented on code in PR #8595: URL: https://github.com/apache/hudi/pull/8595#discussion_r1184454362 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java: ## @@ -145,7 +149,7 @@ public FileStatus[] getFilesInPartitions() {

[GitHub] [hudi] danny0405 commented on a diff in pull request #8596: [BUG-FIX] use try with resource to close stream

2023-05-03 Thread via GitHub
danny0405 commented on code in PR #8596: URL: https://github.com/apache/hudi/pull/8596#discussion_r1184466905 ## hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestRepairsCommand.java: ## @@ -234,6 +234,30 @@ public void testOverwriteHoodieProperties() throws IOException

[GitHub] [hudi] PaddyMelody commented on a diff in pull request #8595: [MINOR] Fixed hadoop configuration not being applied by FileIndex

2023-05-03 Thread via GitHub
PaddyMelody commented on code in PR #8595: URL: https://github.com/apache/hudi/pull/8595#discussion_r1184454362 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/source/FileIndex.java: ## @@ -145,7 +149,7 @@ public FileStatus[] getFilesInPartitions() {

[GitHub] [hudi] duc-dn commented on issue #7806: [SUPPORT] Copy On Write table when ingesting data using hudi-kafka-connector doesn't seem right

2023-05-03 Thread via GitHub
duc-dn commented on issue #7806: URL: https://github.com/apache/hudi/issues/7806#issuecomment-1533938209 @ad1happy2go Thanks a lot -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8303: URL: https://github.com/apache/hudi/pull/8303#issuecomment-1533904674 ## CI report: * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN * 3ad5ae580928952bb601cf90f09abb53d1d436e4 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8618: [HUDI-4944] Don't decode URI twice in HoodieBootstrapRDD

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8618: URL: https://github.com/apache/hudi/pull/8618#issuecomment-1533900435 ## CI report: * 62a3bc4cd0e932895bcdb9eb8ae0936348066289 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8303: [HUDI-5998] Speed up reads from bootstrapped tables in spark

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8303: URL: https://github.com/apache/hudi/pull/8303#issuecomment-1533899798 ## CI report: * 3cfef7fc92a6c5ce9bb078a7186e04614c11647f UNKNOWN * 3ad5ae580928952bb601cf90f09abb53d1d436e4 Azure:

[GitHub] [hudi] soumilshah1995 commented on issue #8400: [SUPPORT] Hudi Offline Compaction in EMR Serverless 6.10 for YouTube Video

2023-05-03 Thread via GitHub
soumilshah1995 commented on issue #8400: URL: https://github.com/apache/hudi/issues/8400#issuecomment-1533872885 Any updates @ad1happy2go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] nsivabalan merged pull request #8622: [HUDI-6163] Add PR size labeler

2023-05-03 Thread via GitHub
nsivabalan merged PR #8622: URL: https://github.com/apache/hudi/pull/8622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[hudi] branch master updated (7f41e22eb3b -> 21c913d8264)

2023-05-03 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 7f41e22eb3b [HUDI-6113] Support multiple transformers using the same config keys in DeltaStreamer (#8514) add

[GitHub] [hudi] soumilshah1995 commented on issue #7879: [Bug] Hudi AWS Connector Throws Error on Hive Sync with Glue

2023-05-03 Thread via GitHub
soumilshah1995 commented on issue #7879: URL: https://github.com/apache/hudi/issues/7879#issuecomment-1533835424 @juanAmayaRamirez lets hop on call here is link https://meet.google.com/gam-wsca-hxi -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] juanAmayaRamirez commented on issue #7879: [Bug] Hudi AWS Connector Throws Error on Hive Sync with Glue

2023-05-03 Thread via GitHub
juanAmayaRamirez commented on issue #7879: URL: https://github.com/apache/hudi/issues/7879#issuecomment-1533834470 Thanks for the quick response! (love your videos BTW) but sorry to tell that I am getting the same error. `An error occurred while calling o110.getDynamicFrame. Reads

[GitHub] [hudi] soumilshah1995 commented on issue #7879: [Bug] Hudi AWS Connector Throws Error on Hive Sync with Glue

2023-05-03 Thread via GitHub
soumilshah1995 commented on issue #7879: URL: https://github.com/apache/hudi/issues/7879#issuecomment-1533820177 Hey Buddy @juanAmayaRamirez just use glue 4.0 and pass these param it will be fixed ``` """ --additional-python-modules | faker==11.3.0 --conf |

[GitHub] [hudi] juanAmayaRamirez commented on issue #7879: [Bug] Hudi AWS Connector Throws Error on Hive Sync with Glue

2023-05-03 Thread via GitHub
juanAmayaRamirez commented on issue #7879: URL: https://github.com/apache/hudi/issues/7879#issuecomment-1533818465 Hi @soumilshah1995 just here to ask what the issue was. I am having a similar issue with lake formation that I can't get to figure out when trying to read a Hudi table from

[GitHub] [hudi] hudi-bot commented on pull request #8618: [HUDI-4944] Don't decode URI twice in HoodieBootstrapRDD

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8618: URL: https://github.com/apache/hudi/pull/8618#issuecomment-1533814379 ## CI report: * 17d27ed9986c621ceb8bd576931349d58d0269f8 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8490: URL: https://github.com/apache/hudi/pull/8490#issuecomment-1533813860 ## CI report: * 7575e66d6a48d702fe1e8d4670cb0890b370e94b Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8618: [HUDI-4944] Don't decode URI twice in HoodieBootstrapRDD

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8618: URL: https://github.com/apache/hudi/pull/8618#issuecomment-1533806869 ## CI report: * 17d27ed9986c621ceb8bd576931349d58d0269f8 Azure:

[GitHub] [hudi] psendyk commented on pull request #8627: keep a single random record instance

2023-05-03 Thread via GitHub
psendyk commented on PR #8627: URL: https://github.com/apache/hudi/pull/8627#issuecomment-1533771953 my bad, meant to open a PR into my fork -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] psendyk closed pull request #8627: keep a single random record instance

2023-05-03 Thread via GitHub
psendyk closed pull request #8627: keep a single random record instance URL: https://github.com/apache/hudi/pull/8627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] hudi-bot commented on pull request #8627: keep a single random record instance

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8627: URL: https://github.com/apache/hudi/pull/8627#issuecomment-1533751232 ## CI report: * a3c973f33153bccbf78c4c0c7ecb60e6d852bd0f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] psendyk commented on a diff in pull request #8627: keep a single random record instance

2023-05-03 Thread via GitHub
psendyk commented on code in PR #8627: URL: https://github.com/apache/hudi/pull/8627#discussion_r1184277562 ## hudi-cli/src/main/scala/org/apache/hudi/cli/DedupeSparkJob.scala: ## @@ -100,81 +106,31 @@ class DedupeSparkJob(basePath: String, getDedupePlan(dupeMap) } -

[GitHub] [hudi] kazdy commented on pull request #7922: [HUDI-5578] Upgrade base docker image for java 8

2023-05-03 Thread via GitHub
kazdy commented on PR #7922: URL: https://github.com/apache/hudi/pull/7922#issuecomment-1533683312 > @kazdy : Is this PR still required ? yes it is, I had issues running integration tests on M1, I did not have time to run these on my amd box yet -- This is an automated message

[GitHub] [hudi] hudi-bot commented on pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8490: URL: https://github.com/apache/hudi/pull/8490#issuecomment-1533665739 ## CI report: * 3d7d1f6d3da030e8416a24a9e1e61f191ba40271 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8490: URL: https://github.com/apache/hudi/pull/8490#issuecomment-1533654796 ## CI report: * 8ee4e9f6036cdaf1241665ef853a5297f422a59e Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8574: URL: https://github.com/apache/hudi/pull/8574#issuecomment-1533644900 ## CI report: * cfa118d8ae39e0cf4bb128dae0893f930c05b38c Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8490: URL: https://github.com/apache/hudi/pull/8490#issuecomment-1533558714 ## CI report: * 8ee4e9f6036cdaf1241665ef853a5297f422a59e Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8490: [HUDI-5968] Fix global index duplicate and handle custom payload when update partition

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8490: URL: https://github.com/apache/hudi/pull/8490#issuecomment-1533547804 ## CI report: * 8ee4e9f6036cdaf1241665ef853a5297f422a59e Azure:

[GitHub] [hudi] sydneyhoran commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-03 Thread via GitHub
sydneyhoran commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1533514539 The reason I was getting an error on deleting records without tombstone was because we were testing by starting from a midpoint of a Kafka topic, so I suspect Deltastreamer didn't

[GitHub] [hudi] sydneyhoran commented on issue #8519: [SUPPORT] Deltastreamer AvroDeserializer failing with java.lang.NullPointerException

2023-05-03 Thread via GitHub
sydneyhoran commented on issue #8519: URL: https://github.com/apache/hudi/issues/8519#issuecomment-1533511853 Thanks to help from Aditya, @rmahindra123 and @nsivabalan , this was the fix that worked for us to filter out tombstones:

[GitHub] [hudi] willforevercn commented on issue #8617: [SUPPORT] MapType support in HUDI

2023-05-03 Thread via GitHub
willforevercn commented on issue #8617: URL: https://github.com/apache/hudi/issues/8617#issuecomment-1533471395 The sample code snippet is using COW table, but I still see the error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] hudi-bot commented on pull request #7359: [HUDI-3304] WIP - Allow selective partial update

2023-05-03 Thread via GitHub
hudi-bot commented on PR #7359: URL: https://github.com/apache/hudi/pull/7359#issuecomment-1533462978 ## CI report: * 2999c56d853134e8476908b79ce77737293ce867 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8574: URL: https://github.com/apache/hudi/pull/8574#issuecomment-1533395428 ## CI report: * 2002f1535315a129bfd8b3985e0e5691ca75b2e9 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #8574: [HUDI-6139] Add support for Transformer schema validation in DeltaStreamer

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8574: URL: https://github.com/apache/hudi/pull/8574#issuecomment-1533376928 ## CI report: * 2002f1535315a129bfd8b3985e0e5691ca75b2e9 Azure:

[jira] [Commented] (HUDI-5493) Revisit the archival process wrt clustering

2023-05-03 Thread Lokesh Jain (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718975#comment-17718975 ] Lokesh Jain commented on HUDI-5493: --- All the known gaps related to clustering and archival are fixed

[jira] [Resolved] (HUDI-6113) Support multiple transformers using the same config keys in DeltaStreamer

2023-05-03 Thread Lokesh Jain (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain resolved HUDI-6113. --- > Support multiple transformers using the same config keys in DeltaStreamer >

[hudi] branch master updated: [HUDI-6113] Support multiple transformers using the same config keys in DeltaStreamer (#8514)

2023-05-03 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 7f41e22eb3b [HUDI-6113] Support multiple

[GitHub] [hudi] nsivabalan merged pull request #8514: [HUDI-6113] Support multiple transformers using the same config keys in DeltaStreamer

2023-05-03 Thread via GitHub
nsivabalan merged PR #8514: URL: https://github.com/apache/hudi/pull/8514 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] hudi-bot commented on pull request #8596: [BUG-FIX] use try with resource to close stream

2023-05-03 Thread via GitHub
hudi-bot commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1533214279 ## CI report: * dbc08754cbe6334473bb72bdc1f0f6ceb39fecfe Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7359: [HUDI-3304] WIP - Allow selective partial update

2023-05-03 Thread via GitHub
hudi-bot commented on PR #7359: URL: https://github.com/apache/hudi/pull/7359#issuecomment-1533210204 ## CI report: * 6371776cabd9b1ba518eced2e3f8611e4a5bd641 Azure:

[GitHub] [hudi] ad1happy2go commented on issue #8623: [SUPPORT] Following Docker Demo Quickstart with OpenJDK 1.8 on Mac arm64

2023-05-03 Thread via GitHub
ad1happy2go commented on issue #8623: URL: https://github.com/apache/hudi/issues/8623#issuecomment-1533201402 Can you try checking out 0.13.0 . I see a similar issue with respect to master for this ticket - https://github.com/apache/hudi/issues/8447 -- This is an automated

[GitHub] [hudi] hudi-bot commented on pull request #7359: [HUDI-3304] WIP - Allow selective partial update

2023-05-03 Thread via GitHub
hudi-bot commented on PR #7359: URL: https://github.com/apache/hudi/pull/7359#issuecomment-1533194573 ## CI report: * 6371776cabd9b1ba518eced2e3f8611e4a5bd641 Azure:

[GitHub] [hudi] alberttwong commented on issue #8623: [SUPPORT] Following Docker Demo Quickstart with OpenJDK 1.8 on Mac arm64

2023-05-03 Thread via GitHub
alberttwong commented on issue #8623: URL: https://github.com/apache/hudi/issues/8623#issuecomment-1533131272 Using instructions at https://hudi.apache.org/docs/docker_demo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

  1   2   >