[jira] [Commented] (HUDI-2254) Builtin sort operator for flink bulk insert
[ https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390300#comment-17390300 ] ASF GitHub Bot commented on HUDI-2254: -- hudi-bot edited a comment on pull request #3372: URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791 ## CI report: * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255) * f044b142e6833a40681b84f0380a8a5af5ad7d33 UNKNOWN * a565942ea36395c0b67c7c7827495a9ef5e6c0af UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Builtin sort operator for flink bulk insert > --- > > Key: HUDI-2254 > URL: https://issues.apache.org/jira/browse/HUDI-2254 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3372: [HUDI-2254] Builtin sort operator for flink bulk insert
hudi-bot edited a comment on pull request #3372: URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791 ## CI report: * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255) * f044b142e6833a40681b84f0380a8a5af5ad7d33 UNKNOWN * a565942ea36395c0b67c7c7827495a9ef5e6c0af UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2254) Builtin sort operator for flink bulk insert
[ https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390299#comment-17390299 ] ASF GitHub Bot commented on HUDI-2254: -- hudi-bot edited a comment on pull request #3372: URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791 ## CI report: * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255) * f044b142e6833a40681b84f0380a8a5af5ad7d33 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Builtin sort operator for flink bulk insert > --- > > Key: HUDI-2254 > URL: https://issues.apache.org/jira/browse/HUDI-2254 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390298#comment-17390298 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254) * 0bb6768327f3a54bb25d4504043acfb94ecfa311 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1256) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3372: [HUDI-2254] Builtin sort operator for flink bulk insert
hudi-bot edited a comment on pull request #3372: URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791 ## CI report: * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255) * f044b142e6833a40681b84f0380a8a5af5ad7d33 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254) * 0bb6768327f3a54bb25d4504043acfb94ecfa311 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1256) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390297#comment-17390297 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254) * 0bb6768327f3a54bb25d4504043acfb94ecfa311 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254) * 0bb6768327f3a54bb25d4504043acfb94ecfa311 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390296#comment-17390296 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2254) Builtin sort operator for flink bulk insert
[ https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390295#comment-17390295 ] ASF GitHub Bot commented on HUDI-2254: -- hudi-bot edited a comment on pull request #3372: URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791 ## CI report: * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Builtin sort operator for flink bulk insert > --- > > Key: HUDI-2254 > URL: https://issues.apache.org/jira/browse/HUDI-2254 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3372: [HUDI-2254] Builtin sort operator for flink bulk insert
hudi-bot edited a comment on pull request #3372: URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791 ## CI report: * 23687df6305830a9382181d0e795c33f4c7d9f98 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1255) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2254) Builtin sort operator for flink bulk insert
[ https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390294#comment-17390294 ] ASF GitHub Bot commented on HUDI-2254: -- hudi-bot commented on pull request #3372: URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791 ## CI report: * 23687df6305830a9382181d0e795c33f4c7d9f98 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Builtin sort operator for flink bulk insert > --- > > Key: HUDI-2254 > URL: https://issues.apache.org/jira/browse/HUDI-2254 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot commented on pull request #3372: [HUDI-2254] Builtin sort operator for flink bulk insert
hudi-bot commented on pull request #3372: URL: https://github.com/apache/hudi/pull/3372#issuecomment-889641791 ## CI report: * 23687df6305830a9382181d0e795c33f4c7d9f98 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2255) Refactor DataSourceOptions
Wenning Ding created HUDI-2255: -- Summary: Refactor DataSourceOptions Key: HUDI-2255 URL: https://issues.apache.org/jira/browse/HUDI-2255 Project: Apache Hudi Issue Type: Improvement Reporter: Wenning Ding As discussed with Vinoth, we can rename DataSourceOptions, from xxx_OPT_KEY to xxx_OPT. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2254) Builtin sort operator for flink bulk insert
[ https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2254: - Labels: pull-request-available (was: ) > Builtin sort operator for flink bulk insert > --- > > Key: HUDI-2254 > URL: https://issues.apache.org/jira/browse/HUDI-2254 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2254) Builtin sort operator for flink bulk insert
[ https://issues.apache.org/jira/browse/HUDI-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390291#comment-17390291 ] ASF GitHub Bot commented on HUDI-2254: -- danny0405 opened a new pull request #3372: URL: https://github.com/apache/hudi/pull/3372 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Builtin sort operator for flink bulk insert > --- > > Key: HUDI-2254 > URL: https://issues.apache.org/jira/browse/HUDI-2254 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] danny0405 opened a new pull request #3372: [HUDI-2254] Builtin sort operator for flink bulk insert
danny0405 opened a new pull request #3372: URL: https://github.com/apache/hudi/pull/3372 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2254) Builtin sort operator for flink bulk insert
Danny Chen created HUDI-2254: Summary: Builtin sort operator for flink bulk insert Key: HUDI-2254 URL: https://issues.apache.org/jira/browse/HUDI-2254 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.9.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390290#comment-17390290 ] ASF GitHub Bot commented on HUDI-2253: -- vinothchandar merged pull request #3371: URL: https://github.com/apache/hudi/pull/3371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated: [HUDI-2253] Refactoring few tests to reduce runningtime. DeltaStreamer and MultiDeltaStreamer tests. Bulk insert row writer tests (#3371)
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 7bdae69 [HUDI-2253] Refactoring few tests to reduce runningtime. DeltaStreamer and MultiDeltaStreamer tests. Bulk insert row writer tests (#3371) 7bdae69 is described below commit 7bdae69053afc5ef604a15806d78317cb976f2ce Author: Sivabalan Narayanan AuthorDate: Fri Jul 30 01:22:26 2021 -0400 [HUDI-2253] Refactoring few tests to reduce runningtime. DeltaStreamer and MultiDeltaStreamer tests. Bulk insert row writer tests (#3371) Co-authored-by: Sivabalan Narayanan --- .../TestHoodieBulkInsertDataInternalWriter.java| 4 +- .../TestHoodieDataSourceInternalWriter.java| 9 +- .../TestHoodieBulkInsertDataInternalWriter.java| 4 +- .../TestHoodieDataSourceInternalBatchWrite.java| 7 +- .../functional/TestHoodieDeltaStreamer.java| 228 +-- .../functional/TestHoodieDeltaStreamerBase.java| 245 + .../TestHoodieMultiTableDeltaStreamer.java | 4 +- 7 files changed, 267 insertions(+), 234 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieBulkInsertDataInternalWriter.java b/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieBulkInsertDataInternalWriter.java index 9735379..fd943b7 100644 --- a/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieBulkInsertDataInternalWriter.java +++ b/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieBulkInsertDataInternalWriter.java @@ -74,7 +74,7 @@ public class TestHoodieBulkInsertDataInternalWriter extends HoodieWriteConfig cfg = getWriteConfig(populateMetaFields); HoodieTable table = HoodieSparkTable.create(cfg, context, metaClient); // execute N rounds -for (int i = 0; i < 3; i++) { +for (int i = 0; i < 2; i++) { String instantTime = "00" + i; // init writer HoodieBulkInsertDataInternalWriter writer = new HoodieBulkInsertDataInternalWriter(table, cfg, instantTime, RANDOM.nextInt(10), RANDOM.nextLong(), RANDOM.nextLong(), @@ -82,7 +82,7 @@ public class TestHoodieBulkInsertDataInternalWriter extends int size = 10 + RANDOM.nextInt(1000); // write N rows to partition1, N rows to partition2 and N rows to partition3 ... Each batch should create a new RowCreateHandle and a new file - int batches = 5; + int batches = 3; Dataset totalInputRows = null; for (int j = 0; j < batches; j++) { diff --git a/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java b/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java index 342e2ae..eea49e6 100644 --- a/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java +++ b/hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java @@ -30,6 +30,7 @@ import org.apache.spark.sql.Row; import org.apache.spark.sql.catalyst.InternalRow; import org.apache.spark.sql.sources.v2.DataSourceOptions; import org.apache.spark.sql.sources.v2.writer.DataWriter; +import org.junit.jupiter.api.Disabled; import org.junit.jupiter.api.Test; import org.junit.jupiter.params.ParameterizedTest; import org.junit.jupiter.params.provider.Arguments; @@ -87,7 +88,7 @@ public class TestHoodieDataSourceInternalWriter extends } int size = 10 + RANDOM.nextInt(1000); -int batches = 5; +int batches = 2; Dataset totalInputRows = null; for (int j = 0; j < batches; j++) { String partitionPath = HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS[j % 3]; @@ -158,7 +159,7 @@ public class TestHoodieDataSourceInternalWriter extends int partitionCounter = 0; // execute N rounds -for (int i = 0; i < 5; i++) { +for (int i = 0; i < 2; i++) { String instantTime = "00" + i; // init writer HoodieDataSourceInternalWriter dataSourceInternalWriter = @@ -168,7 +169,7 @@ public class TestHoodieDataSourceInternalWriter extends DataWriter writer = dataSourceInternalWriter.createWriterFactory().createDataWriter(partitionCounter++, RANDOM.nextLong(), RANDOM.nextLong()); int size = 10 + RANDOM.nextInt(1000); - int batches = 5; // one batch per partition + int batches = 2; // one batch per partition for (int j = 0; j < batches; j++) { String partitionPath = HoodieTestDataGenerator.DEFAULT_PARTITION_PATHS[j % 3]; @@ -195,6 +196,8 @@ public class TestHoodieDataSourceInternalWriter extends } } + // takes up lot of running time with CI. + @Disabled
[GitHub] [hudi] vinothchandar merged pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.
vinothchandar merged pull request #3371: URL: https://github.com/apache/hudi/pull/3371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390289#comment-17390289 ] ASF GitHub Bot commented on HUDI-2253: -- vinothchandar commented on a change in pull request #3371: URL: https://github.com/apache/hudi/pull/3371#discussion_r679655658 ## File path: hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java ## @@ -195,6 +196,8 @@ public void testMultipleDataSourceWrites(boolean populateMetaFields) throws Exce } } + // takes up lot of running time with CI. + @Disabled Review comment: but do we need this re-enabled at some point? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] vinothchandar commented on a change in pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.
vinothchandar commented on a change in pull request #3371: URL: https://github.com/apache/hudi/pull/3371#discussion_r679655658 ## File path: hudi-spark-datasource/hudi-spark2/src/test/java/org/apache/hudi/internal/TestHoodieDataSourceInternalWriter.java ## @@ -195,6 +196,8 @@ public void testMultipleDataSourceWrites(boolean populateMetaFields) throws Exce } } + // takes up lot of running time with CI. + @Disabled Review comment: but do we need this re-enabled at some point? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ziudu commented on issue #3344: [SUPPORT]Best way to ingest a large number of tables
ziudu commented on issue #3344: URL: https://github.com/apache/hudi/issues/3344#issuecomment-889636377 We made a few POC and found: 1. java-client didn't support MOR either so we would not use it. 2. Multitabledeltastreamer did not work correctly in continuous mode. It did work somehow for MOR in single-run mode, but we preferred not to use it as the doc said MOR was not supported. Plus, multitabledeltastreamer runs ingestions in serial mode, not in parallel mode. For the moment, we will stick to delta streamers, and launch them regularly (every 5-10 minutes) to process change data from Debezium or Golden Gate. Let's see what will happen for 1000 tables. However, I don't think it's optimized, as - 1 delta streamer needs at least 1 spark executor, with usually 2GB memory. Most of our tables have only a very small amount of change data (<1MB) during a 5-10 minutes' period. We might need a large Hadoop cluster with enough memory for data ingestion, transformation and PrestoSQL. - We use spark on Yarn, so it takes 10 seconds to create a Yarn delta streamer application, which we think is not optimized either. Our final thought: Is it possible to write a long-running spark application, which listens to multiple data change topics, and writes change data in parallel to Hadoop hoodie tables via pyspark data frame? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390285#comment-17390285 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1254) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
zhangyue19921010 commented on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-889626943 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] mkk1490 commented on issue #3313: [SUPPORT] CoW: Hudi Upsert not working when there is a timestamp field in the composite key
mkk1490 commented on issue #3313: URL: https://github.com/apache/hudi/issues/3313#issuecomment-889627136 @nsivabalan I set the row_writer property to False and ingested the data. Now, timestamp gets converted to their respective epoch seconds and long datatype in hoodie_key ![image](https://user-images.githubusercontent.com/16716227/127602181-65d0075d-4757-4280-aa51-75592fb03fa8.png) This actually solves my issue since during upsert, the key would be in sync with the IDL key. But bulk_insert with row.writer:False is very slow. It actually takes double the time for the same data ingestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390284#comment-17390284 ] ASF GitHub Bot commented on HUDI-2164: -- zhangyue19921010 commented on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-889626943 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new e2f53b1 Travis CI build asf-site e2f53b1 is described below commit e2f53b137d78d0f68cc1aaf3a191d9a6679d9d53 Author: CI AuthorDate: Fri Jul 30 04:32:18 2021 + Travis CI build asf-site --- content/docs/powered_by.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/powered_by.html b/content/docs/powered_by.html index f6b2312..b89cb7b 100644 --- a/content/docs/powered_by.html +++ b/content/docs/powered_by.html @@ -566,7 +566,7 @@ Data Summit Connect, May, 2021 https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-available-in-amazon-emr/;>“New features from Apache hudi in Amazon EMR” https://aws.amazon.com/blogs/big-data/build-a-data-lake-using-amazon-kinesis-data-streams-for-amazon-dynamodb-and-apache-hudi/;>“Build a data lake using amazon kinesis data stream for amazon dynamodb and apache hudi” - Amazon AWS https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-athena-expands-apache-hudi-support/;>“Amazon Athena expands Apache Hudi support” - Amazon AWS - “Part1: Query apache hudi dataset in an amazon S3 data lake with amazon athena : Read optimized queries” + https://aws.amazon.com/blogs/big-data/part-1-query-an-apache-hudi-dataset-in-an-amazon-s3-data-lake-with-amazon-athena-part-1-read-optimized-queries/;>“Part1: Query apache hudi dataset in an amazon S3 data lake with amazon athena : Read optimized queries” - Amazon AWS Powered by
[jira] [Commented] (HUDI-2243) Support Time Travel Query For Hoodie Table
[ https://issues.apache.org/jira/browse/HUDI-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390277#comment-17390277 ] ASF GitHub Bot commented on HUDI-2243: -- codope commented on a change in pull request #3360: URL: https://github.com/apache/hudi/pull/3360#discussion_r679638563 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.functional + +import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions} +import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY} +import org.apache.hudi.common.model.HoodieTableType +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator} +import org.apache.hudi.testutils.HoodieClientTestBase +import org.apache.spark.sql.{Row, SaveMode, SparkSession} +import org.junit.jupiter.api.Assertions.assertEquals +import org.junit.jupiter.api.{AfterEach, BeforeEach} +import org.junit.jupiter.params.ParameterizedTest +import org.junit.jupiter.params.provider.EnumSource + +class TestTimeTravelQuery extends HoodieClientTestBase { + var spark: SparkSession =_ + val commonOpts = Map( +"hoodie.insert.shuffle.parallelism" -> "4", +"hoodie.upsert.shuffle.parallelism" -> "4", +"hoodie.bulkinsert.shuffle.parallelism" -> "2", +"hoodie.delete.shuffle.parallelism" -> "1", +DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key", +DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition", +DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp", +HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test" + ) + + @BeforeEach override def setUp() { +initPath() +initSparkContexts() +spark = sqlContext.sparkSession +initTestDataGenerator() +initFileSystem() + } + + @AfterEach override def tearDown() = { +cleanupSparkContexts() +cleanupTestDataGenerator() +cleanupFileSystem() + } + + @ParameterizedTest + @EnumSource(value = classOf[HoodieTableType]) + def testTimeTravelQuery(tableType: HoodieTableType): Unit = { +initMetaClient(tableType) +val _spark = spark +import _spark.implicits._ + +// First write +val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version") +df1.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name()) + .option(RECORDKEY_FIELD_OPT_KEY.key, "id") + .option(PRECOMBINE_FIELD_OPT_KEY.key, "version") + .option(PARTITIONPATH_FIELD_OPT_KEY.key, "") + .option(KEYGENERATOR_CLASS_OPT_KEY.key, classOf[NonpartitionedKeyGenerator].getName) + .mode(SaveMode.Overwrite) + .save(basePath) + +val firstCommit = metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp + +// Second write +val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version") +df2.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL) + .option(RECORDKEY_FIELD_OPT_KEY.key, "id") + .option(PRECOMBINE_FIELD_OPT_KEY.key, "version") + .option(PARTITIONPATH_FIELD_OPT_KEY.key, "") + .option(KEYGENERATOR_CLASS_OPT_KEY.key, classOf[NonpartitionedKeyGenerator].getName) + .mode(SaveMode.Append) + .save(basePath) +metaClient.reloadActiveTimeline() +val secondCommit = metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp + +// Third write Review comment: Wondering what happens when clean commits are interleaved in between, say as.of.instant is 1002 and there are couple of clean commits before that. I believe the behavior would be same as we have today when latest instant is passed? -- This is an automated message from the Apache Git Service. To
[GitHub] [hudi] codope commented on a change in pull request #3360: [HUDI-2243] Support Time Travel Query For Hoodie Table
codope commented on a change in pull request #3360: URL: https://github.com/apache/hudi/pull/3360#discussion_r679638563 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.functional + +import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions} +import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY} +import org.apache.hudi.common.model.HoodieTableType +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator} +import org.apache.hudi.testutils.HoodieClientTestBase +import org.apache.spark.sql.{Row, SaveMode, SparkSession} +import org.junit.jupiter.api.Assertions.assertEquals +import org.junit.jupiter.api.{AfterEach, BeforeEach} +import org.junit.jupiter.params.ParameterizedTest +import org.junit.jupiter.params.provider.EnumSource + +class TestTimeTravelQuery extends HoodieClientTestBase { + var spark: SparkSession =_ + val commonOpts = Map( +"hoodie.insert.shuffle.parallelism" -> "4", +"hoodie.upsert.shuffle.parallelism" -> "4", +"hoodie.bulkinsert.shuffle.parallelism" -> "2", +"hoodie.delete.shuffle.parallelism" -> "1", +DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key", +DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition", +DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp", +HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test" + ) + + @BeforeEach override def setUp() { +initPath() +initSparkContexts() +spark = sqlContext.sparkSession +initTestDataGenerator() +initFileSystem() + } + + @AfterEach override def tearDown() = { +cleanupSparkContexts() +cleanupTestDataGenerator() +cleanupFileSystem() + } + + @ParameterizedTest + @EnumSource(value = classOf[HoodieTableType]) + def testTimeTravelQuery(tableType: HoodieTableType): Unit = { +initMetaClient(tableType) +val _spark = spark +import _spark.implicits._ + +// First write +val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version") +df1.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name()) + .option(RECORDKEY_FIELD_OPT_KEY.key, "id") + .option(PRECOMBINE_FIELD_OPT_KEY.key, "version") + .option(PARTITIONPATH_FIELD_OPT_KEY.key, "") + .option(KEYGENERATOR_CLASS_OPT_KEY.key, classOf[NonpartitionedKeyGenerator].getName) + .mode(SaveMode.Overwrite) + .save(basePath) + +val firstCommit = metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp + +// Second write +val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version") +df2.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL) + .option(RECORDKEY_FIELD_OPT_KEY.key, "id") + .option(PRECOMBINE_FIELD_OPT_KEY.key, "version") + .option(PARTITIONPATH_FIELD_OPT_KEY.key, "") + .option(KEYGENERATOR_CLASS_OPT_KEY.key, classOf[NonpartitionedKeyGenerator].getName) + .mode(SaveMode.Append) + .save(basePath) +metaClient.reloadActiveTimeline() +val secondCommit = metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp + +// Third write Review comment: Wondering what happens when clean commits are interleaved in between, say as.of.instant is 1002 and there are couple of clean commits before that. I believe the behavior would be same as we have today when latest instant is passed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at:
[jira] [Commented] (HUDI-2243) Support Time Travel Query For Hoodie Table
[ https://issues.apache.org/jira/browse/HUDI-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390276#comment-17390276 ] ASF GitHub Bot commented on HUDI-2243: -- codope commented on a change in pull request #3360: URL: https://github.com/apache/hudi/pull/3360#discussion_r679637630 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java ## @@ -58,6 +58,11 @@ */ Stream getLatestBaseFiles(); +/** Review comment: nit: ``` /** * Stream all the latest version data files across partitions with precondition that commitTime(file) before * maxCommitTime. */ ``` More in line with the existing doc. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Time Travel Query For Hoodie Table > -- > > Key: HUDI-2243 > URL: https://issues.apache.org/jira/browse/HUDI-2243 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > > Support time travel query for hoodie table for both COW and MOR table. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codope commented on a change in pull request #3360: [HUDI-2243] Support Time Travel Query For Hoodie Table
codope commented on a change in pull request #3360: URL: https://github.com/apache/hudi/pull/3360#discussion_r679637630 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/TableFileSystemView.java ## @@ -58,6 +58,11 @@ */ Stream getLatestBaseFiles(); +/** Review comment: nit: ``` /** * Stream all the latest version data files across partitions with precondition that commitTime(file) before * maxCommitTime. */ ``` More in line with the existing doc. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2243) Support Time Travel Query For Hoodie Table
[ https://issues.apache.org/jira/browse/HUDI-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390275#comment-17390275 ] ASF GitHub Bot commented on HUDI-2243: -- codope commented on a change in pull request #3360: URL: https://github.com/apache/hudi/pull/3360#discussion_r679633926 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.functional + +import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions} +import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY} +import org.apache.hudi.common.model.HoodieTableType +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator} +import org.apache.hudi.testutils.HoodieClientTestBase +import org.apache.spark.sql.{Row, SaveMode, SparkSession} +import org.junit.jupiter.api.Assertions.assertEquals +import org.junit.jupiter.api.{AfterEach, BeforeEach} +import org.junit.jupiter.params.ParameterizedTest +import org.junit.jupiter.params.provider.EnumSource + +class TestTimeTravelQuery extends HoodieClientTestBase { + var spark: SparkSession =_ + val commonOpts = Map( +"hoodie.insert.shuffle.parallelism" -> "4", +"hoodie.upsert.shuffle.parallelism" -> "4", +"hoodie.bulkinsert.shuffle.parallelism" -> "2", +"hoodie.delete.shuffle.parallelism" -> "1", +DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key", +DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition", +DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp", +HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test" + ) + + @BeforeEach override def setUp() { +initPath() +initSparkContexts() +spark = sqlContext.sparkSession +initTestDataGenerator() +initFileSystem() + } + + @AfterEach override def tearDown() = { +cleanupSparkContexts() +cleanupTestDataGenerator() +cleanupFileSystem() + } + + @ParameterizedTest + @EnumSource(value = classOf[HoodieTableType]) + def testTimeTravelQuery(tableType: HoodieTableType): Unit = { +initMetaClient(tableType) +val _spark = spark +import _spark.implicits._ + +// First write +val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version") +df1.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name()) + .option(RECORDKEY_FIELD_OPT_KEY.key, "id") + .option(PRECOMBINE_FIELD_OPT_KEY.key, "version") + .option(PARTITIONPATH_FIELD_OPT_KEY.key, "") + .option(KEYGENERATOR_CLASS_OPT_KEY.key, classOf[NonpartitionedKeyGenerator].getName) + .mode(SaveMode.Overwrite) + .save(basePath) + +val firstCommit = metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp + +// Second write +val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version") +df2.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL) Review comment: Shouldn't we set this (and other instances below) to `tableType` just like on line 70? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Time Travel Query For Hoodie Table > -- > > Key: HUDI-2243 > URL: https://issues.apache.org/jira/browse/HUDI-2243 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >
[GitHub] [hudi] codope commented on a change in pull request #3360: [HUDI-2243] Support Time Travel Query For Hoodie Table
codope commented on a change in pull request #3360: URL: https://github.com/apache/hudi/pull/3360#discussion_r679633926 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.functional + +import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions} +import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY} +import org.apache.hudi.common.model.HoodieTableType +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator} +import org.apache.hudi.testutils.HoodieClientTestBase +import org.apache.spark.sql.{Row, SaveMode, SparkSession} +import org.junit.jupiter.api.Assertions.assertEquals +import org.junit.jupiter.api.{AfterEach, BeforeEach} +import org.junit.jupiter.params.ParameterizedTest +import org.junit.jupiter.params.provider.EnumSource + +class TestTimeTravelQuery extends HoodieClientTestBase { + var spark: SparkSession =_ + val commonOpts = Map( +"hoodie.insert.shuffle.parallelism" -> "4", +"hoodie.upsert.shuffle.parallelism" -> "4", +"hoodie.bulkinsert.shuffle.parallelism" -> "2", +"hoodie.delete.shuffle.parallelism" -> "1", +DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key", +DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition", +DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp", +HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test" + ) + + @BeforeEach override def setUp() { +initPath() +initSparkContexts() +spark = sqlContext.sparkSession +initTestDataGenerator() +initFileSystem() + } + + @AfterEach override def tearDown() = { +cleanupSparkContexts() +cleanupTestDataGenerator() +cleanupFileSystem() + } + + @ParameterizedTest + @EnumSource(value = classOf[HoodieTableType]) + def testTimeTravelQuery(tableType: HoodieTableType): Unit = { +initMetaClient(tableType) +val _spark = spark +import _spark.implicits._ + +// First write +val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version") +df1.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name()) + .option(RECORDKEY_FIELD_OPT_KEY.key, "id") + .option(PRECOMBINE_FIELD_OPT_KEY.key, "version") + .option(PARTITIONPATH_FIELD_OPT_KEY.key, "") + .option(KEYGENERATOR_CLASS_OPT_KEY.key, classOf[NonpartitionedKeyGenerator].getName) + .mode(SaveMode.Overwrite) + .save(basePath) + +val firstCommit = metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp + +// Second write +val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version") +df2.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL) Review comment: Shouldn't we set this (and other instances below) to `tableType` just like on line 70? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2243) Support Time Travel Query For Hoodie Table
[ https://issues.apache.org/jira/browse/HUDI-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390273#comment-17390273 ] ASF GitHub Bot commented on HUDI-2243: -- codope commented on a change in pull request #3360: URL: https://github.com/apache/hudi/pull/3360#discussion_r679633926 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.functional + +import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions} +import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY} +import org.apache.hudi.common.model.HoodieTableType +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator} +import org.apache.hudi.testutils.HoodieClientTestBase +import org.apache.spark.sql.{Row, SaveMode, SparkSession} +import org.junit.jupiter.api.Assertions.assertEquals +import org.junit.jupiter.api.{AfterEach, BeforeEach} +import org.junit.jupiter.params.ParameterizedTest +import org.junit.jupiter.params.provider.EnumSource + +class TestTimeTravelQuery extends HoodieClientTestBase { + var spark: SparkSession =_ + val commonOpts = Map( +"hoodie.insert.shuffle.parallelism" -> "4", +"hoodie.upsert.shuffle.parallelism" -> "4", +"hoodie.bulkinsert.shuffle.parallelism" -> "2", +"hoodie.delete.shuffle.parallelism" -> "1", +DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key", +DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition", +DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp", +HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test" + ) + + @BeforeEach override def setUp() { +initPath() +initSparkContexts() +spark = sqlContext.sparkSession +initTestDataGenerator() +initFileSystem() + } + + @AfterEach override def tearDown() = { +cleanupSparkContexts() +cleanupTestDataGenerator() +cleanupFileSystem() + } + + @ParameterizedTest + @EnumSource(value = classOf[HoodieTableType]) + def testTimeTravelQuery(tableType: HoodieTableType): Unit = { +initMetaClient(tableType) +val _spark = spark +import _spark.implicits._ + +// First write +val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version") +df1.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name()) + .option(RECORDKEY_FIELD_OPT_KEY.key, "id") + .option(PRECOMBINE_FIELD_OPT_KEY.key, "version") + .option(PARTITIONPATH_FIELD_OPT_KEY.key, "") + .option(KEYGENERATOR_CLASS_OPT_KEY.key, classOf[NonpartitionedKeyGenerator].getName) + .mode(SaveMode.Overwrite) + .save(basePath) + +val firstCommit = metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp + +// Second write +val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version") +df2.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL) Review comment: Shouldn't we set this to `tableType` just like on line 70? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support Time Travel Query For Hoodie Table > -- > > Key: HUDI-2243 > URL: https://issues.apache.org/jira/browse/HUDI-2243 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: pengzhiwei >Assignee: pengzhiwei >Priority: Major >
[GitHub] [hudi] codope commented on a change in pull request #3360: [HUDI-2243] Support Time Travel Query For Hoodie Table
codope commented on a change in pull request #3360: URL: https://github.com/apache/hudi/pull/3360#discussion_r679633926 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestTimeTravelQuery.scala ## @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.functional + +import org.apache.hudi.{DataSourceReadOptions, DataSourceWriteOptions} +import org.apache.hudi.DataSourceWriteOptions.{KEYGENERATOR_CLASS_OPT_KEY, PARTITIONPATH_FIELD_OPT_KEY, PRECOMBINE_FIELD_OPT_KEY, RECORDKEY_FIELD_OPT_KEY} +import org.apache.hudi.common.model.HoodieTableType +import org.apache.hudi.config.HoodieWriteConfig +import org.apache.hudi.keygen.{ComplexKeyGenerator, NonpartitionedKeyGenerator} +import org.apache.hudi.testutils.HoodieClientTestBase +import org.apache.spark.sql.{Row, SaveMode, SparkSession} +import org.junit.jupiter.api.Assertions.assertEquals +import org.junit.jupiter.api.{AfterEach, BeforeEach} +import org.junit.jupiter.params.ParameterizedTest +import org.junit.jupiter.params.provider.EnumSource + +class TestTimeTravelQuery extends HoodieClientTestBase { + var spark: SparkSession =_ + val commonOpts = Map( +"hoodie.insert.shuffle.parallelism" -> "4", +"hoodie.upsert.shuffle.parallelism" -> "4", +"hoodie.bulkinsert.shuffle.parallelism" -> "2", +"hoodie.delete.shuffle.parallelism" -> "1", +DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key -> "_row_key", +DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key -> "partition", +DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY.key -> "timestamp", +HoodieWriteConfig.TABLE_NAME.key -> "hoodie_test" + ) + + @BeforeEach override def setUp() { +initPath() +initSparkContexts() +spark = sqlContext.sparkSession +initTestDataGenerator() +initFileSystem() + } + + @AfterEach override def tearDown() = { +cleanupSparkContexts() +cleanupTestDataGenerator() +cleanupFileSystem() + } + + @ParameterizedTest + @EnumSource(value = classOf[HoodieTableType]) + def testTimeTravelQuery(tableType: HoodieTableType): Unit = { +initMetaClient(tableType) +val _spark = spark +import _spark.implicits._ + +// First write +val df1 = Seq((1, "a1", 10, 1000)).toDF("id", "name", "value", "version") +df1.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, tableType.name()) + .option(RECORDKEY_FIELD_OPT_KEY.key, "id") + .option(PRECOMBINE_FIELD_OPT_KEY.key, "version") + .option(PARTITIONPATH_FIELD_OPT_KEY.key, "") + .option(KEYGENERATOR_CLASS_OPT_KEY.key, classOf[NonpartitionedKeyGenerator].getName) + .mode(SaveMode.Overwrite) + .save(basePath) + +val firstCommit = metaClient.getActiveTimeline.filterCompletedInstants().lastInstant().get().getTimestamp + +// Second write +val df2 = Seq((1, "a1", 12, 1001)).toDF("id", "name", "value", "version") +df2.write.format("org.apache.hudi") + .options(commonOpts) + .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY.key, DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL) Review comment: Shouldn't we set this to `tableType` just like on line 70? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2101) support z-order for hudi
[ https://issues.apache.org/jira/browse/HUDI-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390272#comment-17390272 ] ASF GitHub Bot commented on HUDI-2101: -- hudi-bot edited a comment on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571 ## CI report: * 4112d163dffa737e6bd8761796746d33f7e896cb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1253) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > support z-order for hudi > > > Key: HUDI-2101 > URL: https://issues.apache.org/jira/browse/HUDI-2101 > Project: Apache Hudi > Issue Type: Sub-task > Components: Spark Integration >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > support z-order for hudi to optimze the query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi
hudi-bot edited a comment on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571 ## CI report: * 4112d163dffa737e6bd8761796746d33f7e896cb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1253) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390268#comment-17390268 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390261#comment-17390261 ] ASF GitHub Bot commented on HUDI-2253: -- hudi-bot edited a comment on pull request #3371: URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692 ## CI report: * 723eb6da23126ad85bbc7f62a182e025026462e7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1251) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.
hudi-bot edited a comment on pull request #3371: URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692 ## CI report: * 723eb6da23126ad85bbc7f62a182e025026462e7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1251) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-2234) MERGE INTO works only ON primary key
[ https://issues.apache.org/jira/browse/HUDI-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengzhiwei reassigned HUDI-2234: Assignee: pengzhiwei > MERGE INTO works only ON primary key > > > Key: HUDI-2234 > URL: https://issues.apache.org/jira/browse/HUDI-2234 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Sagar Sumit >Assignee: pengzhiwei >Priority: Major > > {code:sql} > drop table if exists hudi_gh_ext_fixed; > create table hudi_gh_ext_fixed (id int, name string, price double, ts long) > using hudi options(primaryKey = 'id', precombineField = 'ts') location > 'file:///tmp/hudi-h4-fixed'; > insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120); > insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120); > insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120); > update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER'; > drop table if exists hudi_fixed; > create table hudi_fixed (id int, name string, price double, ts long) using > hudi options(primaryKey = 'id', precombineField = 'ts') partitioned by (ts) > location 'file:///tmp/hudi-h4-part-fixed'; > insert into hudi_fixed values(2, 'UBER', 200, 120); > MERGE INTO hudi_fixed > USING (select id, name, price, ts from hudi_gh_ext_fixed) updates > ON hudi_fixed.name = updates.name > WHEN MATCHED THEN > UPDATE SET * > WHEN NOT MATCHED > THEN INSERT *; > -- java.lang.IllegalArgumentException: Merge Key[name] is not Equal to the > defined primary key[id] in table hudi_fixed > --at > org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425) > --at > org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:146) > --at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > --at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > --at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > --at > org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390257#comment-17390257 ] ASF GitHub Bot commented on HUDI-2164: -- zhangyue19921010 commented on a change in pull request #3259: URL: https://github.com/apache/hudi/pull/3259#discussion_r679615918 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -1115,6 +1125,47 @@ public void testAsyncClusteringServiceWithCompaction() throws Exception { }); } + @Test Review comment: nice idea, changed ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java ## @@ -118,20 +134,41 @@ public static void main(String[] args) { jsc.stop(); } + private static void validateRunningMode(Config cfg) { +// --mode has a higher priority than --schedule +// If we remove --schedule option in the future we need to change runningMode default value to EXECUTE +if (StringUtils.isNullOrEmpty(cfg.runningMode)) { Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390255#comment-17390255 ] ASF GitHub Bot commented on HUDI-2164: -- zhangyue19921010 commented on a change in pull request #3259: URL: https://github.com/apache/hudi/pull/3259#discussion_r679615761 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -449,6 +451,14 @@ static void assertAtleastNDeltaCommits(int minExpected, String tablePath, FileSy assertTrue(minExpected <= numDeltaCommits, "Got=" + numDeltaCommits + ", exp >=" + minExpected); } +static void assertAtLeastNCompletedReplaceCommits(int minExpected, String tablePath, DistributedFileSystem fs) { Review comment: Sure, changed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390256#comment-17390256 ] ASF GitHub Bot commented on HUDI-2164: -- zhangyue19921010 commented on a change in pull request #3259: URL: https://github.com/apache/hudi/pull/3259#discussion_r679615842 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -1115,6 +1125,47 @@ public void testAsyncClusteringServiceWithCompaction() throws Exception { }); } + @Test + public void testHoodieAsyncClusteringJobWithScheduleAndExecute() throws Exception { Review comment: nice idea, changed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
zhangyue19921010 commented on a change in pull request #3259: URL: https://github.com/apache/hudi/pull/3259#discussion_r679615918 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -1115,6 +1125,47 @@ public void testAsyncClusteringServiceWithCompaction() throws Exception { }); } + @Test Review comment: nice idea, changed ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java ## @@ -118,20 +134,41 @@ public static void main(String[] args) { jsc.stop(); } + private static void validateRunningMode(Config cfg) { +// --mode has a higher priority than --schedule +// If we remove --schedule option in the future we need to change runningMode default value to EXECUTE +if (StringUtils.isNullOrEmpty(cfg.runningMode)) { Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
zhangyue19921010 commented on a change in pull request #3259: URL: https://github.com/apache/hudi/pull/3259#discussion_r679615842 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -1115,6 +1125,47 @@ public void testAsyncClusteringServiceWithCompaction() throws Exception { }); } + @Test + public void testHoodieAsyncClusteringJobWithScheduleAndExecute() throws Exception { Review comment: nice idea, changed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
zhangyue19921010 commented on a change in pull request #3259: URL: https://github.com/apache/hudi/pull/3259#discussion_r679615761 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java ## @@ -449,6 +451,14 @@ static void assertAtleastNDeltaCommits(int minExpected, String tablePath, FileSy assertTrue(minExpected <= numDeltaCommits, "Got=" + numDeltaCommits + ", exp >=" + minExpected); } +static void assertAtLeastNCompletedReplaceCommits(int minExpected, String tablePath, DistributedFileSystem fs) { Review comment: Sure, changed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-2232) MERGE INTO fails with table having nested struct and partioned by
[ https://issues.apache.org/jira/browse/HUDI-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pengzhiwei reassigned HUDI-2232: Assignee: pengzhiwei > MERGE INTO fails with table having nested struct and partioned by > - > > Key: HUDI-2232 > URL: https://issues.apache.org/jira/browse/HUDI-2232 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Sagar Sumit >Assignee: pengzhiwei >Priority: Blocker > Fix For: 0.9.0 > > > {code:java} > // TO reproduce > drop table if exists hudi_gh_ext_fixed; > create table hudi_gh_ext_fixed ( id int, name string, price double, ts > long, repo struct) using hudi options(primaryKey = > 'id', precombineField = 'ts') location 'file:///tmp/hudi-h5-fixed'; > insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120, > struct(234273476,"onnet/onnet-portal")); > insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120, > struct(234273476,"onnet/onnet-portal")); > insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120, > struct(234273476,"onnet/onnet-portal")); > update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER'; > drop table if exists hudi_fixed; > create table hudi_fixed ( id int, name string, price double, ts long, > repo struct) using hudi options(primaryKey = 'id', > precombineField = 'ts') partitioned by (ts) location > 'file:///tmp/hudi-h5-part-fixed'; > insert into hudi_fixed values(2, 'UBER', 200, > struct(234273476,"onnet/onnet-portal"), 130); > select * from hudi_gh_ext_fixed; > 20210727145240 20210727145240_0_6442266 id:3 > 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1472-72063_20210727145240.parquet 3 > AMZN 300.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145301 > 20210727145301_0_6442269 id:2 > 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1565-77094_20210727145301.parquet 2 > UBER 150.0 120 {"id":234273476,"name":"onnet/onnet-portal"}20210727145254 > 20210727145254_0_6442268 id:4 > 77fc2e3e-add9-4f08-a5e1-9671d66add26-0_0-1534-75283_20210727145254.parquet 4 > GOOG 300.0 120 {"id":234273476,"name":"onnet/onnet-portal"} > select * from hudi_fixed; > 20210727145325 20210727145325_0_6442270 id:2 ts=130 > ba148271-68b4-40aa-816a-158170446e41-0_0-1595-78703_20210727145325.parquet 2 > UBER 200.0 {"id":234273476,"name":"onnet/onnet-portal"} 130 > MERGE INTO hudi_fixed USING (select id, name, price, repo, ts from > hudi_gh_ext_fixed) updatesON hudi_fixed.id = updates.idWHEN MATCHED THEN > UPDATE SET *WHEN NOT MATCHED THEN INSERT *; > -- java.lang.IllegalArgumentException: UnSupport StructType yet-- at > org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.convert(SqlTypedRecord.scala:122)-- > at > org.apache.spark.sql.hudi.command.payload.SqlTypedRecord.get(SqlTypedRecord.scala:56)-- > at > org.apache.hudi.sql.payload.ExpressionPayloadEvaluator_b695b02a_99b5_479e_8299_507da9b206fd.eval(Unknown > Source)-- at > org.apache.spark.sql.hudi.command.payload.ExpressionPayload$AvroTypeConvertEvaluator.eval(ExpressionPayload.scala:333) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2101) support z-order for hudi
[ https://issues.apache.org/jira/browse/HUDI-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390253#comment-17390253 ] ASF GitHub Bot commented on HUDI-2101: -- hudi-bot edited a comment on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571 ## CI report: * 6912152293bb9336c060d41715dbae14527287a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1207) * 4112d163dffa737e6bd8761796746d33f7e896cb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1253) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > support z-order for hudi > > > Key: HUDI-2101 > URL: https://issues.apache.org/jira/browse/HUDI-2101 > Project: Apache Hudi > Issue Type: Sub-task > Components: Spark Integration >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > support z-order for hudi to optimze the query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi
hudi-bot edited a comment on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571 ## CI report: * 6912152293bb9336c060d41715dbae14527287a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1207) * 4112d163dffa737e6bd8761796746d33f7e896cb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1253) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1985) Website re-design implementation
[ https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390251#comment-17390251 ] ASF GitHub Bot commented on HUDI-1985: -- vingov commented on pull request #3366: URL: https://github.com/apache/hudi/pull/3366#issuecomment-889591087 > @vingov @nsivabalan Shall we move the schema evolution subsection to a new page under documentation? It's gonna be a story of its own. Yes, we can do all structural changes after we land this version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Website re-design implementation > > > Key: HUDI-1985 > URL: https://issues.apache.org/jira/browse/HUDI-1985 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: Raymond Xu >Assignee: Vinoth Govindarajan >Priority: Blocker > Labels: documentation, pull-request-available > Fix For: 0.9.0 > > > To provide better navigation and organization of Hudi website's info, we have > done a re-design of the web pages. > Previous discussion > [https://github.com/apache/hudi/issues/2905] > > See the wireframe and final design in > [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6] > (login Figma to comment) > The design is ready for implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2101) support z-order for hudi
[ https://issues.apache.org/jira/browse/HUDI-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390250#comment-17390250 ] ASF GitHub Bot commented on HUDI-2101: -- hudi-bot edited a comment on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571 ## CI report: * 6912152293bb9336c060d41715dbae14527287a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1207) * 4112d163dffa737e6bd8761796746d33f7e896cb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > support z-order for hudi > > > Key: HUDI-2101 > URL: https://issues.apache.org/jira/browse/HUDI-2101 > Project: Apache Hudi > Issue Type: Sub-task > Components: Spark Integration >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > support z-order for hudi to optimze the query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] vingov commented on pull request #3366: [HUDI-1985] Migrate the hudi site to docusaurus platform (website complete re-design)
vingov commented on pull request #3366: URL: https://github.com/apache/hudi/pull/3366#issuecomment-889591087 > @vingov @nsivabalan Shall we move the schema evolution subsection to a new page under documentation? It's gonna be a story of its own. Yes, we can do all structural changes after we land this version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi
hudi-bot edited a comment on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571 ## CI report: * 6912152293bb9336c060d41715dbae14527287a2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1207) * 4112d163dffa737e6bd8761796746d33f7e896cb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2101) support z-order for hudi
[ https://issues.apache.org/jira/browse/HUDI-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390249#comment-17390249 ] ASF GitHub Bot commented on HUDI-2101: -- xiarixiaoyao commented on a change in pull request #3330: URL: https://github.com/apache/hudi/pull/3330#discussion_r679612078 ## File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java ## @@ -229,6 +229,11 @@ public HoodieWriteMetadata deletePartitions(HoodieEngineContext context, String throw new HoodieNotSupportedException("DeletePartitions is not supported yet"); } + @Override + public HoodieWriteMetadata> optimize(HoodieEngineContext context, String instantTime, List> records) { +throw new HoodieNotSupportedException("optimize data layouy is not supported yet"); Review comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > support z-order for hudi > > > Key: HUDI-2101 > URL: https://issues.apache.org/jira/browse/HUDI-2101 > Project: Apache Hudi > Issue Type: Sub-task > Components: Spark Integration >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > support z-order for hudi to optimze the query -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi
xiarixiaoyao commented on a change in pull request #3330: URL: https://github.com/apache/hudi/pull/3330#discussion_r679612078 ## File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/HoodieFlinkCopyOnWriteTable.java ## @@ -229,6 +229,11 @@ public HoodieWriteMetadata deletePartitions(HoodieEngineContext context, String throw new HoodieNotSupportedException("DeletePartitions is not supported yet"); } + @Override + public HoodieWriteMetadata> optimize(HoodieEngineContext context, String instantTime, List> records) { +throw new HoodieNotSupportedException("optimize data layouy is not supported yet"); Review comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390247#comment-17390247 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * ec01ad1f162813a5fafb7d14da7b65eea64d06ea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1124) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1132) * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * ec01ad1f162813a5fafb7d14da7b65eea64d06ea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1124) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1132) * abefb17f2c42c06e9c81ec26c6561172fedf4add Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1252) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3259: [HUDI-2164] Let users build cluster plan and execute this plan at once using HoodieClusteringJob for async clustering
hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * ec01ad1f162813a5fafb7d14da7b65eea64d06ea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1124) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1132) * abefb17f2c42c06e9c81ec26c6561172fedf4add UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2164) Build cluster plan and execute this plan at once for HoodieClusteringJob
[ https://issues.apache.org/jira/browse/HUDI-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390246#comment-17390246 ] ASF GitHub Bot commented on HUDI-2164: -- hudi-bot edited a comment on pull request #3259: URL: https://github.com/apache/hudi/pull/3259#issuecomment-878086249 ## CI report: * ec01ad1f162813a5fafb7d14da7b65eea64d06ea Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1124) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1132) * abefb17f2c42c06e9c81ec26c6561172fedf4add UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Build cluster plan and execute this plan at once for HoodieClusteringJob > > > Key: HUDI-2164 > URL: https://issues.apache.org/jira/browse/HUDI-2164 > Project: Apache Hudi > Issue Type: Task >Reporter: Yue Zhang >Priority: Major > Labels: pull-request-available > > For now, Hudi can let users submit a HoodieClusteringJob to build a > clustering plan or execute a clustering plan through --schedule or > --instant-time config. > If users want to trigger a clustering job, he has to > # Submit a HoodieClusteringJob to build a clustering job through --schedule > config > # Copy the created clustering Instant time form Log info. > # Submit the HoodieClusteringJob again to execute this created clustering > plan through --instant-time config. > The pain point is that there are too many steps when trigger a clustering and > need to copy and paste the instant time from log file manually so that we > can't make it automatically. > > I just raise a PR to offer a new config named --mode or -m in short > ||--mode||remarks|| > |execute|Execute a cluster plan at given instant which means --instant-time > is needed here. default value. | > |schedule|Make a clustering plan.| > |*scheduleAndExecute*|Make a cluster plan first and execute that plan > immediately| > Now users can use --mode scheduleAndExecute to Build cluster plan and execute > this plan at once using HoodieClusteringJob. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1985) Website re-design implementation
[ https://issues.apache.org/jira/browse/HUDI-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390245#comment-17390245 ] ASF GitHub Bot commented on HUDI-1985: -- codope commented on pull request #3366: URL: https://github.com/apache/hudi/pull/3366#issuecomment-889586829 @vingov @nsivabalan Shall we move the schema evolution subsection to a new page under documentation? It's gonna be a story of its own. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Website re-design implementation > > > Key: HUDI-1985 > URL: https://issues.apache.org/jira/browse/HUDI-1985 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs >Reporter: Raymond Xu >Assignee: Vinoth Govindarajan >Priority: Blocker > Labels: documentation, pull-request-available > Fix For: 0.9.0 > > > To provide better navigation and organization of Hudi website's info, we have > done a re-design of the web pages. > Previous discussion > [https://github.com/apache/hudi/issues/2905] > > See the wireframe and final design in > [https://www.figma.com/file/tipod1JZRw7anZRWBI6sZh/Hudi.Apache?node-id=32%3A6] > (login Figma to comment) > The design is ready for implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] codope commented on pull request #3366: [HUDI-1985] Migrate the hudi site to docusaurus platform (website complete re-design)
codope commented on pull request #3366: URL: https://github.com/apache/hudi/pull/3366#issuecomment-889586829 @vingov @nsivabalan Shall we move the schema evolution subsection to a new page under documentation? It's gonna be a story of its own. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch asf-site updated: Fixing article link (#3370)
This is an automated email from the ASF dual-hosted git repository. garyli pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 7945698 Fixing article link (#3370) 7945698 is described below commit 794569814773d2bc777132d0e9e4d6553a56b443 Author: Sivabalan Narayanan AuthorDate: Thu Jul 29 22:37:24 2021 -0400 Fixing article link (#3370) --- docs/_docs/1_4_powered_by.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/_docs/1_4_powered_by.md b/docs/_docs/1_4_powered_by.md index 1c52f15..3ae92a9 100644 --- a/docs/_docs/1_4_powered_by.md +++ b/docs/_docs/1_4_powered_by.md @@ -194,7 +194,7 @@ You can check out [our blog pages](https://hudi.apache.org/blog.html) for conten 23. ["New features from Apache hudi in Amazon EMR"](https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-available-in-amazon-emr/) 24. ["Build a data lake using amazon kinesis data stream for amazon dynamodb and apache hudi"](https://aws.amazon.com/blogs/big-data/build-a-data-lake-using-amazon-kinesis-data-streams-for-amazon-dynamodb-and-apache-hudi/) - Amazon AWS 25. ["Amazon Athena expands Apache Hudi support"](https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-athena-expands-apache-hudi-support/) - Amazon AWS -26. ["Part1: Query apache hudi dataset in an amazon S3 data lake with amazon athena : Read optimized queries"](part-1-query-an-apache-hudi-dataset-in-an-amazon-s3-data-lake-with-amazon-athena-part-1-read-optimized-queries/) +26. ["Part1: Query apache hudi dataset in an amazon S3 data lake with amazon athena : Read optimized queries"](https://aws.amazon.com/blogs/big-data/part-1-query-an-apache-hudi-dataset-in-an-amazon-s3-data-lake-with-amazon-athena-part-1-read-optimized-queries/) - Amazon AWS ## Powered by
[GitHub] [hudi] garyli1019 merged pull request #3370: [MINOR] Fixing an article Hyperlink
garyli1019 merged pull request #3370: URL: https://github.com/apache/hudi/pull/3370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2253: -- Component/s: Testing > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test > Components: Testing >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390239#comment-17390239 ] ASF GitHub Bot commented on HUDI-2253: -- hudi-bot edited a comment on pull request #3371: URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692 ## CI report: * 723eb6da23126ad85bbc7f62a182e025026462e7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1251) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390238#comment-17390238 ] ASF GitHub Bot commented on HUDI-2253: -- danny0405 commented on a change in pull request #3371: URL: https://github.com/apache/hudi/pull/3371#discussion_r679603117 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java ## @@ -38,16 +38,14 @@ import java.io.IOException; import java.util.Arrays; import java.util.List; -import java.util.Random; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertThrows; import static org.junit.jupiter.api.Assertions.assertTrue; -public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamer { +public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamerBase { Review comment: Nice catch ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2252) Replace read full data with read latest commit data in flink stream read
[ https://issues.apache.org/jira/browse/HUDI-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390240#comment-17390240 ] ASF GitHub Bot commented on HUDI-2252: -- hudi-bot edited a comment on pull request #3368: URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348 ## CI report: * fa0e1ada155715e783630f0fb2ff3120b0ca683c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1250) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Replace read full data with read latest commit data in flink stream read > - > > Key: HUDI-2252 > URL: https://issues.apache.org/jira/browse/HUDI-2252 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Zheng yunhong >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Replace read full data with read latest commit data in flink stream read. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3368: [HUDI-2252] Default consumes from the latest instant for flink streaming reader
hudi-bot edited a comment on pull request #3368: URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348 ## CI report: * fa0e1ada155715e783630f0fb2ff3120b0ca683c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1250) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.
hudi-bot edited a comment on pull request #3371: URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692 ## CI report: * 723eb6da23126ad85bbc7f62a182e025026462e7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1251) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.
danny0405 commented on a change in pull request #3371: URL: https://github.com/apache/hudi/pull/3371#discussion_r679603117 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java ## @@ -38,16 +38,14 @@ import java.io.IOException; import java.util.Arrays; import java.util.List; -import java.util.Random; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertThrows; import static org.junit.jupiter.api.Assertions.assertTrue; -public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamer { +public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamerBase { Review comment: Nice catch ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390237#comment-17390237 ] ASF GitHub Bot commented on HUDI-2253: -- hudi-bot commented on pull request #3371: URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692 ## CI report: * 723eb6da23126ad85bbc7f62a182e025026462e7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot commented on pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.
hudi-bot commented on pull request #3371: URL: https://github.com/apache/hudi/pull/3371#issuecomment-889579692 ## CI report: * 723eb6da23126ad85bbc7f62a182e025026462e7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390236#comment-17390236 ] ASF GitHub Bot commented on HUDI-2253: -- nsivabalan commented on a change in pull request #3371: URL: https://github.com/apache/hudi/pull/3371#discussion_r679602005 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java ## @@ -38,16 +38,14 @@ import java.io.IOException; import java.util.Arrays; import java.util.List; -import java.util.Random; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertThrows; import static org.junit.jupiter.api.Assertions.assertTrue; -public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamer { +public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamerBase { Review comment: Previously TestHoodieMultiTableDeltaStreamer extended from TestHoodieDeltaStreamer and so tests in TestHoodieDeltaStreamer were running twice. This refactoring will ensure that TestHoodieDeltaStreamer tests run only once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan commented on a change in pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.
nsivabalan commented on a change in pull request #3371: URL: https://github.com/apache/hudi/pull/3371#discussion_r679602005 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java ## @@ -38,16 +38,14 @@ import java.io.IOException; import java.util.Arrays; import java.util.List; -import java.util.Random; import static org.junit.jupiter.api.Assertions.assertEquals; import static org.junit.jupiter.api.Assertions.assertThrows; import static org.junit.jupiter.api.Assertions.assertTrue; -public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamer { +public class TestHoodieMultiTableDeltaStreamer extends TestHoodieDeltaStreamerBase { Review comment: Previously TestHoodieMultiTableDeltaStreamer extended from TestHoodieDeltaStreamer and so tests in TestHoodieDeltaStreamer were running twice. This refactoring will ensure that TestHoodieDeltaStreamer tests run only once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2253: - Labels: pull-request-available (was: ) > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390235#comment-17390235 ] ASF GitHub Bot commented on HUDI-2253: -- nsivabalan opened a new pull request #3371: URL: https://github.com/apache/hudi/pull/3371 - DeltaStreamer and MultiTableDeltaStreamer tests. - Bulk insert row writer tests ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] nsivabalan opened a new pull request #3371: [HUDI-2253] Refactoring few tests to reduce run time.
nsivabalan opened a new pull request #3371: URL: https://github.com/apache/hudi/pull/3371 - DeltaStreamer and MultiTableDeltaStreamer tests. - Bulk insert row writer tests ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2253: -- Fix Version/s: 0.9.0 > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Priority: Major > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
sivabalan narayanan created HUDI-2253: - Summary: Reduce CI run time for deltastreamer and bulk insert row writer tests Key: HUDI-2253 URL: https://issues.apache.org/jira/browse/HUDI-2253 Project: Apache Hudi Issue Type: Test Reporter: sivabalan narayanan Reduce CI run time for deltastreamer and bulk insert row writer tests org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2253: -- Status: In Progress (was: Open) > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HUDI-2253) Reduce CI run time for deltastreamer and bulk insert row writer tests
[ https://issues.apache.org/jira/browse/HUDI-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-2253: - Assignee: sivabalan narayanan > Reduce CI run time for deltastreamer and bulk insert row writer tests > - > > Key: HUDI-2253 > URL: https://issues.apache.org/jira/browse/HUDI-2253 > Project: Apache Hudi > Issue Type: Test >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Fix For: 0.9.0 > > > Reduce CI run time for deltastreamer and bulk insert row writer tests > > org.apache.hudi.utilities.functional.TestHoodieMultiTableDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieDataSourceInternalBatchWrite > org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer > org.apache.hudi.spark3.internal.TestHoodieBulkInsertDataInternalWriter -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-1848) Add support for HMS in Hive-sync-tool
[ https://issues.apache.org/jira/browse/HUDI-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390234#comment-17390234 ] ASF GitHub Bot commented on HUDI-1848: -- stym06 commented on pull request #2879: URL: https://github.com/apache/hudi/pull/2879#issuecomment-889577462 What parameters are required to be passed to sync with HMS ? Can we use the thrift url? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for HMS in Hive-sync-tool > - > > Key: HUDI-1848 > URL: https://issues.apache.org/jira/browse/HUDI-1848 > Project: Apache Hudi > Issue Type: New Feature >Reporter: Jagmeet Bali >Priority: Minor > Labels: pull-request-available, sev:normal > > Add support for HMS in Hive-sync-tool > Currently there are two ways to sun DDL queries in hive-sync-tool. > This work adds on top of > [https://github.com/apache/hudi/pull/2532|https://github.com/apache/hudi/pull/2532/files] > and adds a pluggable way to support > new way to run DDL queries using HMS. > > Different DDL executors can be selected via diff syncConfig options > useJDBC true -> JDBCExecutor will be used > useJDBC false -> QlHiveQueryExecutor will be used > useHMS true -> HMSDDLExecutor will be used. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] stym06 commented on pull request #2879: [HUDI-1848] Adding support for HMS for running DDL queries in hive-sy…
stym06 commented on pull request #2879: URL: https://github.com/apache/hudi/pull/2879#issuecomment-889577462 What parameters are required to be passed to sync with HMS ? Can we use the thrift url? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] fengjian428 commented on issue #3327: [SUPPORT] ingetst avro nested array field error occur
fengjian428 commented on issue #3327: URL: https://github.com/apache/hudi/issues/3327#issuecomment-889576271 > @fengjian428 I will try to reproduce this. Please take a look at https://hudi.apache.org/docs/writing_data.html#schema-evolution > The exception says that 'array' field not found. Was the scehma changed in between and there are partitions with older schema as well? no,I can reproduce it with new table -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2252) Replace read full data with read latest commit data in flink stream read
[ https://issues.apache.org/jira/browse/HUDI-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390229#comment-17390229 ] ASF GitHub Bot commented on HUDI-2252: -- hudi-bot edited a comment on pull request #3368: URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348 ## CI report: * cb38a4b68f19377f309e228fe99a49bd3e4f6265 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1243) * fa0e1ada155715e783630f0fb2ff3120b0ca683c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1250) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Replace read full data with read latest commit data in flink stream read > - > > Key: HUDI-2252 > URL: https://issues.apache.org/jira/browse/HUDI-2252 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Zheng yunhong >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Replace read full data with read latest commit data in flink stream read. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3368: [HUDI-2252] Default consumes from the latest instant for flink streaming reader
hudi-bot edited a comment on pull request #3368: URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348 ## CI report: * cb38a4b68f19377f309e228fe99a49bd3e4f6265 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1243) * fa0e1ada155715e783630f0fb2ff3120b0ca683c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1250) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2252) Replace read full data with read latest commit data in flink stream read
[ https://issues.apache.org/jira/browse/HUDI-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390227#comment-17390227 ] ASF GitHub Bot commented on HUDI-2252: -- hudi-bot edited a comment on pull request #3368: URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348 ## CI report: * cb38a4b68f19377f309e228fe99a49bd3e4f6265 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1243) * fa0e1ada155715e783630f0fb2ff3120b0ca683c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Replace read full data with read latest commit data in flink stream read > - > > Key: HUDI-2252 > URL: https://issues.apache.org/jira/browse/HUDI-2252 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Zheng yunhong >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Replace read full data with read latest commit data in flink stream read. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3368: [HUDI-2252] Default consumes from the latest instant for flink streaming reader
hudi-bot edited a comment on pull request #3368: URL: https://github.com/apache/hudi/pull/3368#issuecomment-889033348 ## CI report: * cb38a4b68f19377f309e228fe99a49bd3e4f6265 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1243) * fa0e1ada155715e783630f0fb2ff3120b0ca683c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-1987) Fix non partition table hive meta sync for flink writer
[ https://issues.apache.org/jira/browse/HUDI-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra resolved HUDI-1987. - Resolution: Fixed > Fix non partition table hive meta sync for flink writer > --- > > Key: HUDI-1987 > URL: https://issues.apache.org/jira/browse/HUDI-1987 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1987) Fix non partition table hive meta sync for flink writer
[ https://issues.apache.org/jira/browse/HUDI-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated HUDI-1987: Status: In Progress (was: Open) > Fix non partition table hive meta sync for flink writer > --- > > Key: HUDI-1987 > URL: https://issues.apache.org/jira/browse/HUDI-1987 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1986) Skip creating marker files for flink merge handle
[ https://issues.apache.org/jira/browse/HUDI-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated HUDI-1986: Status: In Progress (was: Open) > Skip creating marker files for flink merge handle > - > > Key: HUDI-1986 > URL: https://issues.apache.org/jira/browse/HUDI-1986 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > > Skip creating the marker files for flink merge handle to make it more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1986) Skip creating marker files for flink merge handle
[ https://issues.apache.org/jira/browse/HUDI-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra resolved HUDI-1986. - Fix Version/s: 0.9.0 Resolution: Fixed > Skip creating marker files for flink merge handle > - > > Key: HUDI-1986 > URL: https://issues.apache.org/jira/browse/HUDI-1986 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > Skip creating the marker files for flink merge handle to make it more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1980) Optimize the code to prevent other exceptions from causing resources not to be closed
[ https://issues.apache.org/jira/browse/HUDI-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra resolved HUDI-1980. - Fix Version/s: 0.9.0 Assignee: Wei Resolution: Fixed > Optimize the code to prevent other exceptions from causing resources not to > be closed > - > > Key: HUDI-1980 > URL: https://issues.apache.org/jira/browse/HUDI-1980 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Wei >Assignee: Wei >Priority: Critical > Labels: pull-request-available > Fix For: 0.9.0 > > > When *HoodieHiveClient* initializing resources, some exceptions may cause > resources to fail to close -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-1980) Optimize the code to prevent other exceptions from causing resources not to be closed
[ https://issues.apache.org/jira/browse/HUDI-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated HUDI-1980: Status: In Progress (was: Open) > Optimize the code to prevent other exceptions from causing resources not to > be closed > - > > Key: HUDI-1980 > URL: https://issues.apache.org/jira/browse/HUDI-1980 > Project: Apache Hudi > Issue Type: Improvement > Components: Hive Integration >Reporter: Wei >Priority: Critical > Labels: pull-request-available > > When *HoodieHiveClient* initializing resources, some exceptions may cause > resources to fail to close -- This message was sent by Atlassian Jira (v8.3.4#803005)