Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-11 Thread via GitHub
danny0405 commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2051024781 @the-other-tim-brown Can you fix the Azure CI failure? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-2051021308 ## CI report: * dbdefad652d5c51b19175ca70374b7737a004952 UNKNOWN * 8f1ba6d46d8777f39c522d8bcac545ba3d4fd544 Azure:

Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2051021047 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 11c19fa8fd39ed058a4e3487c99c793610b61564 UNKNOWN * 72970e41f06fb68466eba338ddfbd6553d2e96b1 Azure:

Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2051013783 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 11c19fa8fd39ed058a4e3487c99c793610b61564 UNKNOWN * 72970e41f06fb68466eba338ddfbd6553d2e96b1 Azure:

[I] Different system parse different time zone of timestamp type from the parquet file created by hudi [hudi]

2024-04-11 Thread via GitHub
AshinGau opened a new issue, #11003: URL: https://github.com/apache/hudi/issues/11003 **Describe the problem you faced** I am a committer of [Doris](https://github.com/apache/doris). When I use Doris to read the parquet file created by hudi, I find that the output of timestamp is

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10615: URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050971324 ## CI report: * 989ffd5220e4f5ae666a05afdd0e7de3c6543972 Azure:

Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2050965613 ## CI report: * fe5ed81020fb8d974c306f61a222f9583e2dab29 Azure:

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10615: URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050965097 ## CI report: * 989ffd5220e4f5ae666a05afdd0e7de3c6543972 Azure:

Re: [PR] [HUDI-7290] Don't assume ReplaceCommits are always Clustering [hudi]

2024-04-11 Thread via GitHub
bvaradar merged PR #10479: URL: https://github.com/apache/hudi/pull/10479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

(hudi) branch master updated (a41d7aeafed -> c9256e5e784)

2024-04-11 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from a41d7aeafed [HUDI-7605] Allow merger strategy to be set in spark sql writer (#10999) add c9256e5e784 [HUDI-7290]

Re: [PR] [HUDI-7577] Avoid MDT compaction instant time conflicts [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10992: URL: https://github.com/apache/hudi/pull/10992#issuecomment-2050913689 ## CI report: * 1f421909625781304a531ccadcbf6a37ca5185a4 UNKNOWN * c8423769cd6ef01b7afcaafd63f51b9f450ec7ea Azure:

Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2050907684 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 11c19fa8fd39ed058a4e3487c99c793610b61564 UNKNOWN * c68fd47d3080f055eb4b688f7e75b261ff6803d6 Azure:

Re: [PR] [HUDI-7606] Unpersist RDDs after table services, mainly compaction [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #11000: URL: https://github.com/apache/hudi/pull/11000#issuecomment-2050873544 ## CI report: * 12cf06d732847bf9ca925bf2bb4e2e0eb39b8855 Azure:

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-2050873413 ## CI report: * dbdefad652d5c51b19175ca70374b7737a004952 UNKNOWN * f6c5bebf97872d05f27137febbc727d5ad9f8e78 Azure:

[I] [SUPPORT] can't retrieve original partition column value when exacting date with CustomKeyGenerator [hudi]

2024-04-11 Thread via GitHub
liangchen-datanerd opened a new issue, #11002: URL: https://github.com/apache/hudi/issues/11002 **problem** the requirement was to extract date value as partition from event_time column. According to the hudi offical doc the ingestion config for hoodie would be like this ```

Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2050867333 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 10f0484ea6b5b820c257711dc8cd4da9cfa366cd Azure:

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-2050867560 ## CI report: * dbdefad652d5c51b19175ca70374b7737a004952 UNKNOWN * f6c5bebf97872d05f27137febbc727d5ad9f8e78 Azure:

Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2050862646 ## CI report: * 09e4971db9ad7d5677a5757ed0b718e24ca4fb0b Azure:

Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10973: URL: https://github.com/apache/hudi/pull/10973#issuecomment-2050862535 ## CI report: * fac97b368a646aeddcc7e6728d7228f75f30bd82 Azure:

[jira] [Updated] (HUDI-7595) Investigate and fix flaky tests in ITTestHoodieDataSource

2024-04-11 Thread Vova Kolmakov (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov updated HUDI-7595: Labels: test-stability (was: ) > Investigate and fix flaky tests in ITTestHoodieDataSource >

[jira] [Updated] (HUDI-7595) Investigate and fix flaky tests in ITTestHoodieDataSource

2024-04-11 Thread Vova Kolmakov (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov updated HUDI-7595: Component/s: tests-ci > Investigate and fix flaky tests in ITTestHoodieDataSource >

[jira] [Updated] (HUDI-7578) Avoid unnecessary rewriting when copy old data from old base to new base file to improve compaction performance

2024-04-11 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7578: Fix Version/s: 0.15.0 1.0.0 > Avoid unnecessary rewriting when copy old data from old

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
jonvex commented on code in PR #10954: URL: https://github.com/apache/hudi/pull/10954#discussion_r1561923113 ## hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestSparkHoodieParquetReader.java: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
jonvex commented on code in PR #10954: URL: https://github.com/apache/hudi/pull/10954#discussion_r1561922013 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/SparkHoodieParquetReaderBase.scala: ## @@ -0,0 +1,99 @@ +/*

Re: [PR] [HUDI-7577] Avoid MDT compaction instant time conflicts [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10992: URL: https://github.com/apache/hudi/pull/10992#issuecomment-2050828266 ## CI report: * 1f421909625781304a531ccadcbf6a37ca5185a4 UNKNOWN * d13639a9823e827c45e3e619eebf9c93c8c2085c Azure:

[jira] [Updated] (HUDI-7607) Test with timestamp based key generator

2024-04-11 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7607: -- Fix Version/s: 1.0.0 > Test with timestamp based key generator >

[jira] [Created] (HUDI-7607) Test with timestamp based key generator

2024-04-11 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7607: - Summary: Test with timestamp based key generator Key: HUDI-7607 URL: https://issues.apache.org/jira/browse/HUDI-7607 Project: Apache Hudi Issue Type:

[jira] [Updated] (HUDI-7378) Fix Spark SQL DML with custom key generator

2024-04-11 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7378: Reviewers: Jonathan Vexler, Sagar Sumit > Fix Spark SQL DML with custom key generator >

Re: [PR] [HUDI-7577] Avoid MDT compaction instant time conflicts [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10992: URL: https://github.com/apache/hudi/pull/10992#issuecomment-2050822230 ## CI report: * 1f421909625781304a531ccadcbf6a37ca5185a4 UNKNOWN * d13639a9823e827c45e3e619eebf9c93c8c2085c Azure:

Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2050821831 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 10f0484ea6b5b820c257711dc8cd4da9cfa366cd Azure:

Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2050822300 ## CI report: * 09e4971db9ad7d5677a5757ed0b718e24ca4fb0b Azure:

[jira] [Updated] (HUDI-7577) Avoid MDT compaction instant time conflicts

2024-04-11 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7577: - Status: Patch Available (was: In Progress) > Avoid MDT compaction instant time conflicts >

[jira] [Updated] (HUDI-7580) Inserting rows into partitioned table leads to data sanity issues

2024-04-11 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7580: -- Sprint: Sprint 2024-03-25 > Inserting rows into partitioned table leads to data sanity issues >

[jira] [Updated] (HUDI-7604) DataSourceWriteOptions.TABLE_NAME() does not work

2024-04-11 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7604: Epic Link: HUDI-7537 > DataSourceWriteOptions.TABLE_NAME() does not work >

[jira] [Updated] (HUDI-7580) Inserting rows into partitioned table leads to data sanity issues

2024-04-11 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7580: -- Status: In Progress (was: Open) > Inserting rows into partitioned table leads to data sanity issues >

[jira] [Updated] (HUDI-7604) DataSourceWriteOptions.TABLE_NAME() does not work

2024-04-11 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7604: Fix Version/s: 1.0.0 > DataSourceWriteOptions.TABLE_NAME() does not work >

[jira] [Updated] (HUDI-7605) Unable to set merger strategy with DataSourceWriteOptions.RECORD_MERGER_STRATEGY

2024-04-11 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7605: Epic Link: HUDI-7322 > Unable to set merger strategy with > DataSourceWriteOptions.RECORD_MERGER_STRATEGY

[jira] [Updated] (HUDI-7605) Unable to set merger strategy with DataSourceWriteOptions.RECORD_MERGER_STRATEGY

2024-04-11 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7605: Fix Version/s: 1.0.0 > Unable to set merger strategy with > DataSourceWriteOptions.RECORD_MERGER_STRATEGY

Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2050815033 ## CI report: * 09e4971db9ad7d5677a5757ed0b718e24ca4fb0b Azure:

Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10973: URL: https://github.com/apache/hudi/pull/10973#issuecomment-2050814856 ## CI report: * e093cc8dec1a4aab10e29aad164569dbfd3a1667 Azure:

Re: [I] [SUPPORT] Issue with Repartition on Kafka Input DataFrame and Same Precombine Value Rows In One Batch [hudi]

2024-04-11 Thread via GitHub
brightwon commented on issue #10995: URL: https://github.com/apache/hudi/issues/10995#issuecomment-2050814895 @ad1happy2go Thank you for your reply. What I want is to speed up the tagging stage. Could you suggest a solution? I can achieve this by using repartition with a completely

Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2050814560 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 10f0484ea6b5b820c257711dc8cd4da9cfa366cd Azure:

[jira] [Updated] (HUDI-7580) Inserting rows into partitioned table leads to data sanity issues

2024-04-11 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7580: -- Labels: hudi-1.0.0-beta2 (was: ) > Inserting rows into partitioned table leads to data sanity issues >

(hudi) branch master updated (c870da2f375 -> a41d7aeafed)

2024-04-11 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from c870da2f375 [HUDI-6441] Passing custom Headers with Hudi Callback URL (#10970) add a41d7aeafed [HUDI-7605] Allow

Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub
yihua merged PR #10999: URL: https://github.com/apache/hudi/pull/10999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2050777376 ## CI report: * 09e4971db9ad7d5677a5757ed0b718e24ca4fb0b Azure:

Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2050776939 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 10f0484ea6b5b820c257711dc8cd4da9cfa366cd Azure:

Re: [PR] [HUDI-7606] Unpersist RDDs after table services, mainly compaction [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #11000: URL: https://github.com/apache/hudi/pull/11000#issuecomment-2050777340 ## CI report: * 12cf06d732847bf9ca925bf2bb4e2e0eb39b8855 Azure:

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10615: URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050776752 ## CI report: * 989ffd5220e4f5ae666a05afdd0e7de3c6543972 Azure:

[jira] [Updated] (HUDI-7606) Ensure that rdds persisted by table services are released in SparkRDDWriteClient

2024-04-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7606: - Labels: pull-request-available (was: ) > Ensure that rdds persisted by table services are

Re: [PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #11001: URL: https://github.com/apache/hudi/pull/11001#issuecomment-2050771916 ## CI report: * 09e4971db9ad7d5677a5757ed0b718e24ca4fb0b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7606] Unpersist RDDs after table services, mainly compaction [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #11000: URL: https://github.com/apache/hudi/pull/11000#issuecomment-2050771888 ## CI report: * 12cf06d732847bf9ca925bf2bb4e2e0eb39b8855 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [MINOR] Streamer test setup performance [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10806: URL: https://github.com/apache/hudi/pull/10806#issuecomment-2050771563 ## CI report: * e0414708ebbd734156c0383cb4e5dbfe5ff4151a UNKNOWN * 10f0484ea6b5b820c257711dc8cd4da9cfa366cd Azure:

Re: [I] RLI Spark Hudi Error occurs when executing map [hudi]

2024-04-11 Thread via GitHub
jayakasadev commented on issue #10609: URL: https://github.com/apache/hudi/issues/10609#issuecomment-2050767899 I hit the same error when I try to use record indexing: ``` hoodie.metadata.record.index.enable=true hoodie.index.type=RECORD_INDEX ``` Are there additional

Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10999: URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050765903 ## CI report: * 15e59507262bb635269fc03c820b518558eb267a Azure:

Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10973: URL: https://github.com/apache/hudi/pull/10973#issuecomment-2050765806 ## CI report: * Unknown: [CANCELED](TBD) * e093cc8dec1a4aab10e29aad164569dbfd3a1667 Azure:

Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-11 Thread via GitHub
the-other-tim-brown closed pull request #10975: [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile URL: https://github.com/apache/hudi/pull/10975 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [HUDI-7576] add partitionPath as an instance variable to HoodieBaseFile and HoodieLogFile [hudi]

2024-04-11 Thread via GitHub
the-other-tim-brown commented on PR #10975: URL: https://github.com/apache/hudi/pull/10975#issuecomment-2050759641 @danny0405 I've made this PR instead to get the same performance win without taking on the big refactor: https://github.com/apache/hudi/pull/11001 -- This is an automated

[PR] [HUDI-7576] Improve efficiency of getRelativePartitionPath, reduce computation of partitionPath in AbstractTableFileSystemView [hudi]

2024-04-11 Thread via GitHub
the-other-tim-brown opened a new pull request, #11001: URL: https://github.com/apache/hudi/pull/11001 ### Change Logs - Improve the efficiency of `getRelativePartitionPath` by reducing the number of operations on the path object that are required to get the final result - Reduce

[jira] [Updated] (HUDI-7576) Avoid recomputing partition path in AbstractFileSystemView

2024-04-11 Thread Timothy Brown (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown updated HUDI-7576: Summary: Avoid recomputing partition path in AbstractFileSystemView (was: Add partitionPath to the

[jira] [Updated] (HUDI-7576) Avoid recomputing partition path in AbstractFileSystemView

2024-04-11 Thread Timothy Brown (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown updated HUDI-7576: Description: We have observed a non-negligible amount of CPU spent simply computing the partition

[jira] [Created] (HUDI-7606) Ensure that rdds persisted by table services are released in SparkRDDWriteClient

2024-04-11 Thread Rajesh Mahindra (Jira)
Rajesh Mahindra created HUDI-7606: - Summary: Ensure that rdds persisted by table services are released in SparkRDDWriteClient Key: HUDI-7606 URL: https://issues.apache.org/jira/browse/HUDI-7606

[jira] [Assigned] (HUDI-7606) Ensure that rdds persisted by table services are released in SparkRDDWriteClient

2024-04-11 Thread Rajesh Mahindra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Mahindra reassigned HUDI-7606: - Assignee: Rajesh Mahindra > Ensure that rdds persisted by table services are released in

[PR] [] Unpersist RDDs after table services, mainly compaction [hudi]

2024-04-11 Thread via GitHub
rmahindra123 opened a new pull request, #11000: URL: https://github.com/apache/hudi/pull/11000 ### Change Logs Unpersist RDDs after table services. Currently, the releaseResources is called before running inline table services. Tests show that the RDDs persisted by compaction may

Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10973: URL: https://github.com/apache/hudi/pull/10973#issuecomment-2050729563 ## CI report: * Unknown: [CANCELED](TBD) * e093cc8dec1a4aab10e29aad164569dbfd3a1667 Azure:

Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10973: URL: https://github.com/apache/hudi/pull/10973#issuecomment-2050723861 ## CI report: * Unknown: [CANCELED](TBD) * e093cc8dec1a4aab10e29aad164569dbfd3a1667 UNKNOWN Bot commands @hudi-bot supports the following commands:

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-2050723773 ## CI report: * dbdefad652d5c51b19175ca70374b7737a004952 UNKNOWN * f6c5bebf97872d05f27137febbc727d5ad9f8e78 Azure:

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10615: URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050723314 ## CI report: * 50b27846bf118909f3fd69f20cf5d7654d8a87c7 Azure:

Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-11 Thread via GitHub
pt657407064 commented on code in PR #10973: URL: https://github.com/apache/hudi/pull/10973#discussion_r1561855940 ## hudi-cli/src/main/resources/application.yml: ## @@ -20,4 +20,7 @@ spring: shell: history: enabled: true - name: hoodie-cmd.log \ No newline

Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-11 Thread via GitHub
pt657407064 commented on PR #10973: URL: https://github.com/apache/hudi/pull/10973#issuecomment-2050718867 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR] Hudi CLI 'version' command output empty string [hudi]

2024-04-11 Thread via GitHub
pt657407064 commented on code in PR #10973: URL: https://github.com/apache/hudi/pull/10973#discussion_r1561855940 ## hudi-cli/src/main/resources/application.yml: ## @@ -20,4 +20,7 @@ spring: shell: history: enabled: true - name: hoodie-cmd.log \ No newline

Re: [PR] [HUDI-7378] Fix Spark SQL DML with custom key generator [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10615: URL: https://github.com/apache/hudi/pull/10615#issuecomment-2050717174 ## CI report: * 50b27846bf118909f3fd69f20cf5d7654d8a87c7 Azure:

Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10999: URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050680031 ## CI report: * d392ef9a33b9019a8fadb9c4117cdca48116b48f Azure:

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-2050679866 ## CI report: * 120226ac7bc6eeb735307745dfa47782a311470b Azure:

Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10999: URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050673612 ## CI report: * d392ef9a33b9019a8fadb9c4117cdca48116b48f Azure:

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-2050673466 ## CI report: * 120226ac7bc6eeb735307745dfa47782a311470b Azure:

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-205013 ## CI report: * 120226ac7bc6eeb735307745dfa47782a311470b Azure:

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10998: URL: https://github.com/apache/hudi/pull/10998#issuecomment-2050666813 ## CI report: * e7e51394cc39b914503b7e1e3608cdb3ff690a30 Azure:

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-11 Thread via GitHub
jonvex commented on code in PR #10998: URL: https://github.com/apache/hudi/pull/10998#discussion_r1561821789 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -964,6 +964,11 @@ object DataSourceOptionsHelper { def

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-2050619920 ## CI report: * 815b6fd6af5676590079cf6f9e23b7a2fdb4ccd8 Azure:

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-11 Thread via GitHub
yihua commented on code in PR #10998: URL: https://github.com/apache/hudi/pull/10998#discussion_r1561742623 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -964,6 +964,11 @@ object DataSourceOptionsHelper { def

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-11 Thread via GitHub
yihua commented on code in PR #10998: URL: https://github.com/apache/hudi/pull/10998#discussion_r1561716214 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -964,6 +964,11 @@ object DataSourceOptionsHelper { def

Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub
yihua commented on code in PR #10999: URL: https://github.com/apache/hudi/pull/10999#discussion_r1561708412 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala: ## @@ -1405,4 +1405,24 @@ class TestMORDataSource extends

Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10999: URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050521034 ## CI report: * d392ef9a33b9019a8fadb9c4117cdca48116b48f Azure:

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10998: URL: https://github.com/apache/hudi/pull/10998#issuecomment-2050520984 ## CI report: * ea501ae87f61ff965f558360bb703bfad595c2a0 Azure:

Re: [PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10999: URL: https://github.com/apache/hudi/pull/10999#issuecomment-2050508350 ## CI report: * d392ef9a33b9019a8fadb9c4117cdca48116b48f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10998: URL: https://github.com/apache/hudi/pull/10998#issuecomment-2050508252 ## CI report: * ea501ae87f61ff965f558360bb703bfad595c2a0 Azure:

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-11 Thread via GitHub
nsivabalan commented on code in PR #10998: URL: https://github.com/apache/hudi/pull/10998#discussion_r1561644552 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -964,6 +964,11 @@ object DataSourceOptionsHelper { def

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
yihua commented on code in PR #10954: URL: https://github.com/apache/hudi/pull/10954#discussion_r1561624097 ## hudi-spark-datasource/hudi-spark3.2.x/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32HoodieParquetReader.scala: ## @@ -0,0 +1,267 @@ +/* + *

[jira] [Updated] (HUDI-7605) Unable to set merger strategy with DataSourceWriteOptions.RECORD_MERGER_STRATEGY

2024-04-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7605: - Labels: pull-request-available (was: ) > Unable to set merger strategy with >

[PR] [HUDI-7605] allow merger strategy to be set in spark sql writer [hudi]

2024-04-11 Thread via GitHub
jonvex opened a new pull request, #10999: URL: https://github.com/apache/hudi/pull/10999 ### Change Logs DataSourceWriteOptions.RECORD_MERGER_STRATEGY.key() should change the strategy set in the table configs but currently does not ### Impact make config work ###

[jira] [Created] (HUDI-7605) Unable to set merger strategy with DataSourceWriteOptions.RECORD_MERGER_STRATEGY

2024-04-11 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7605: - Summary: Unable to set merger strategy with DataSourceWriteOptions.RECORD_MERGER_STRATEGY Key: HUDI-7605 URL: https://issues.apache.org/jira/browse/HUDI-7605

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10998: URL: https://github.com/apache/hudi/pull/10998#issuecomment-2050413788 ## CI report: * ea501ae87f61ff965f558360bb703bfad595c2a0 Azure:

Re: [PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10998: URL: https://github.com/apache/hudi/pull/10998#issuecomment-2050403305 ## CI report: * ea501ae87f61ff965f558360bb703bfad595c2a0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
yihua commented on code in PR #10954: URL: https://github.com/apache/hudi/pull/10954#discussion_r1561538422 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/SparkHoodieParquetReader.scala: ## @@ -0,0 +1,46 @@ +/* + * Licensed to

[jira] [Updated] (HUDI-7604) DataSourceWriteOptions.TABLE_NAME() does not work

2024-04-11 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7604: -- Status: Patch Available (was: In Progress) > DataSourceWriteOptions.TABLE_NAME() does not work

[jira] [Updated] (HUDI-7604) DataSourceWriteOptions.TABLE_NAME() does not work

2024-04-11 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-7604: -- Status: In Progress (was: Open) > DataSourceWriteOptions.TABLE_NAME() does not work >

Re: [PR] [HUDI-7565] Create spark file readers to read a single file instead of an entire partition [hudi]

2024-04-11 Thread via GitHub
hudi-bot commented on PR #10954: URL: https://github.com/apache/hudi/pull/10954#issuecomment-2050389038 ## CI report: * 815b6fd6af5676590079cf6f9e23b7a2fdb4ccd8 Azure:

[jira] [Updated] (HUDI-7604) DataSourceWriteOptions.TABLE_NAME() does not work

2024-04-11 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7604: - Labels: pull-request-available (was: ) > DataSourceWriteOptions.TABLE_NAME() does not work >

[PR] [HUDI-7604] Make table name config work properly [hudi]

2024-04-11 Thread via GitHub
jonvex opened a new pull request, #10998: URL: https://github.com/apache/hudi/pull/10998 ### Change Logs DataSourceWriteOptions.TABLE_NAME() should function like "hoodie.table.name" ### Impact Fixes annoyance ### Risk level (write none, low medium or high below)

  1   2   >