[GitHub] [hudi] xushiyan commented on issue #3554: [SUPPORT] Support Apache Spark 3.1
xushiyan commented on issue #3554: URL: https://github.com/apache/hudi/issues/3554#issuecomment-917554170 Moved to JIRA https://issues.apache.org/jira/browse/HUDI-1869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan closed issue #3554: [SUPPORT] Support Apache Spark 3.1
xushiyan closed issue #3554: URL: https://github.com/apache/hudi/issues/3554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-2190) Unnecessary exception catch in SparkBulkInsertPreppedCommitActionExecutor#execute
[ https://issues.apache.org/jira/browse/HUDI-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-2190. Resolution: Won't Do > Unnecessary exception catch in > SparkBulkInsertPreppedCommitActionExecutor#execute > - > > Key: HUDI-2190 > URL: https://issues.apache.org/jira/browse/HUDI-2190 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Reporter: zhangminglei >Priority: Major > Labels: pull-request-available > > SparkBulkInsertPreppedCommitActionExecutor#execute has a try catch and etc in > some others class, but it is unnecessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group
[ https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413648#comment-17413648 ] Raymond Xu commented on HUDI-864: - [~vinoth] the parquet version is upgraded in spark 3.2.0-rc2 [https://github.com/apache/spark/blob/03f5d23e96374670c7ea3525f871393432f0e538/pom.xml#L139] The issue may stay with Spark 2 but go away when we force upgrade to build Hudi with Spark 3.2 > parquet schema conflict: optional binary (UTF8) is not a group > --- > > Key: HUDI-864 > URL: https://issues.apache.org/jira/browse/HUDI-864 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.5.2 >Reporter: Roland Johann >Priority: Major > Labels: sev:high, user-support-issues > > When dealing with struct types like this > {code:json} > { > "type": "struct", > "fields": [ > { > "name": "categoryResults", > "type": { > "type": "array", > "elementType": { > "type": "struct", > "fields": [ > { > "name": "categoryId", > "type": "string", > "nullable": true, > "metadata": {} > } > ] > }, > "containsNull": true > }, > "nullable": true, > "metadata": {} > } > ] > } > {code} > The second ingest batch throws that exception: > {code} > ERROR [Executor task launch worker for task 15] > commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error > upserting bucketType UPDATE for partition :0 > org.apache.hudi.exception.HoodieException: > org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100) > at > org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76) > at > org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73) > at > org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258) > at > org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271) > at > org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at >
[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group
[ https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-864: Component/s: Common Core > parquet schema conflict: optional binary (UTF8) is not a group > --- > > Key: HUDI-864 > URL: https://issues.apache.org/jira/browse/HUDI-864 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Affects Versions: 0.5.2 >Reporter: Roland Johann >Priority: Major > Labels: sev:high, user-support-issues > > When dealing with struct types like this > {code:json} > { > "type": "struct", > "fields": [ > { > "name": "categoryResults", > "type": { > "type": "array", > "elementType": { > "type": "struct", > "fields": [ > { > "name": "categoryId", > "type": "string", > "nullable": true, > "metadata": {} > } > ] > }, > "containsNull": true > }, > "nullable": true, > "metadata": {} > } > ] > } > {code} > The second ingest batch throws that exception: > {code} > ERROR [Executor task launch worker for task 15] > commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error > upserting bucketType UPDATE for partition :0 > org.apache.hudi.exception.HoodieException: > org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100) > at > org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76) > at > org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73) > at > org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258) > at > org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271) > at > org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hudi.exception.HoodieException: >
[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group
[ https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-864: Labels: sev:high user-support-issues (was: sevv:high user-support-issues) > parquet schema conflict: optional binary (UTF8) is not a group > --- > > Key: HUDI-864 > URL: https://issues.apache.org/jira/browse/HUDI-864 > Project: Apache Hudi > Issue Type: Bug >Affects Versions: 0.5.2 >Reporter: Roland Johann >Priority: Major > Labels: sev:high, user-support-issues > > When dealing with struct types like this > {code:json} > { > "type": "struct", > "fields": [ > { > "name": "categoryResults", > "type": { > "type": "array", > "elementType": { > "type": "struct", > "fields": [ > { > "name": "categoryId", > "type": "string", > "nullable": true, > "metadata": {} > } > ] > }, > "containsNull": true > }, > "nullable": true, > "metadata": {} > } > ] > } > {code} > The second ingest batch throws that exception: > {code} > ERROR [Executor task launch worker for task 15] > commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error > upserting bucketType UPDATE for partition :0 > org.apache.hudi.exception.HoodieException: > org.apache.hudi.exception.HoodieException: > java.util.concurrent.ExecutionException: > org.apache.hudi.exception.HoodieException: operation has failed > at > org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100) > at > org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76) > at > org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73) > at > org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258) > at > org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271) > at > org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337) > at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182) > at > org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:286) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hudi.exception.HoodieException:
[GitHub] [hudi] hudi-bot edited a comment on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources
hudi-bot edited a comment on pull request #2210: URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641 ## CI report: * 67dabdd51934a7141a299114de2b836b1f016fd5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2166) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] QingFengZhou commented on issue #143: Tracking ticket for folks to be added to slack group
QingFengZhou commented on issue #143: URL: https://github.com/apache/hudi/issues/143#issuecomment-917547988 Please add me to slack group Email: zhouqf2...@163.com Thanks!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources
hudi-bot edited a comment on pull request #2210: URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641 ## CI report: * a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1668) * 67dabdd51934a7141a299114de2b836b1f016fd5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2166) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources
hudi-bot edited a comment on pull request #2210: URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641 ## CI report: * a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1668) * 67dabdd51934a7141a299114de2b836b1f016fd5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codecov-commenter removed a comment on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources
codecov-commenter removed a comment on pull request #2210: URL: https://github.com/apache/hudi/pull/2210#issuecomment-862054362 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) Report > Merging [#2210](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (b845e34) into [master](https://codecov.io/gh/apache/hudi/commit/673d62f3c3ab07abb3fcd319607e657339bc0682?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) (673d62f) will **increase** coverage by `44.07%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/hudi/pull/2210/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) ```diff @@ Coverage Diff @@ ## master#2210 +/- ## = + Coverage 8.43% 52.51% +44.07% - Complexity 62 3664 +3602 = Files70 474 +404 Lines 288023997+21117 Branches359 2741 +2382 = + Hits24312601+12358 - Misses 261610137 +7521 - Partials 21 1259 +1238 ``` | Flag | Coverage Δ | | |---|---|---| | hudicli | `39.95% <ø> (?)` | | | hudiclient | `∅ <ø> (∅)` | | | hudicommon | `48.20% <ø> (?)` | | | hudiflink | `60.73% <ø> (?)` | | | hudihadoopmr | `51.34% <ø> (?)` | | | hudisparkdatasource | `66.47% <ø> (?)` | | | hudisync | `46.79% <ø> (+40.00%)` | :arrow_up: | | huditimelineservice | `64.36% <ø> (?)` | | | hudiutilities | `?` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation) | Coverage Δ | | |---|---|---| | [...ache/hudi/hive/HiveMetastoreBasedLockProvider.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZU1ldGFzdG9yZUJhc2VkTG9ja1Byb3ZpZGVyLmphdmE=) | `0.00% <0.00%> (-60.22%)` | :arrow_down: | | [...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==) | | | | [...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==) | | | | [...a/org/apache/hudi/utilities/sources/SqlSource.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvU3FsU291cmNlLmphdmE=) | | | | [...s/exception/HoodieIncrementalPullSQLException.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVJbmNyZW1lbnRhbFB1bGxTUUxFeGNlcHRpb24uamF2YQ==) | | | | [...che/hudi/utilities/schema/SchemaPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQb3N0UHJvY2Vzc29yLmphdmE=) | | | |
[jira] [Resolved] (HUDI-2398) Event Time not getting updated for inserts
[ https://issues.apache.org/jira/browse/HUDI-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu resolved HUDI-2398. -- Resolution: Fixed > Event Time not getting updated for inserts > -- > > Key: HUDI-2398 > URL: https://issues.apache.org/jira/browse/HUDI-2398 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core, metrics >Reporter: Ankush Kanungo >Assignee: Ankush Kanungo >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > When using DefaultHoodieRecordPayload class, event time (for latency > calculations) is not being updated for inserts and stays null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-2398) Event Time not getting updated for inserts
[ https://issues.apache.org/jira/browse/HUDI-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-2398. > Event Time not getting updated for inserts > -- > > Key: HUDI-2398 > URL: https://issues.apache.org/jira/browse/HUDI-2398 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core, metrics >Reporter: Ankush Kanungo >Assignee: Ankush Kanungo >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > When using DefaultHoodieRecordPayload class, event time (for latency > calculations) is not being updated for inserts and stays null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated: [HUDI-2398] Collect event time for inserts in DefaultHoodieRecordPayload (#3602)
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 4f991ee [HUDI-2398] Collect event time for inserts in DefaultHoodieRecordPayload (#3602) 4f991ee is described below commit 4f991ee3525c6225c7bf3b46e272f7d5b919196e Author: Ankush Kanungo <40214578+akanun...@users.noreply.github.com> AuthorDate: Sat Sep 11 20:27:40 2021 -0700 [HUDI-2398] Collect event time for inserts in DefaultHoodieRecordPayload (#3602) --- .../apache/hudi/io/HoodieSortedMergeHandle.java| 8 .../common/model/DefaultHoodieRecordPayload.java | 23 ++--- .../model/TestDefaultHoodieRecordPayload.java | 24 ++ 3 files changed, 40 insertions(+), 15 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java index 763178d..606e63a 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java @@ -90,9 +90,9 @@ public class HoodieSortedMergeHandle ext } try { if (useWriterSchema) { - writeRecord(hoodieRecord, hoodieRecord.getData().getInsertValue(tableSchemaWithMetaFields)); + writeRecord(hoodieRecord, hoodieRecord.getData().getInsertValue(tableSchemaWithMetaFields, config.getProps())); } else { - writeRecord(hoodieRecord, hoodieRecord.getData().getInsertValue(tableSchema)); + writeRecord(hoodieRecord, hoodieRecord.getData().getInsertValue(tableSchema, config.getProps())); } insertRecordsWritten++; writtenRecordKeys.add(keyToPreWrite); @@ -112,9 +112,9 @@ public class HoodieSortedMergeHandle ext HoodieRecord hoodieRecord = keyToNewRecords.get(key); if (!writtenRecordKeys.contains(hoodieRecord.getRecordKey())) { if (useWriterSchema) { -writeRecord(hoodieRecord, hoodieRecord.getData().getInsertValue(tableSchemaWithMetaFields)); +writeRecord(hoodieRecord, hoodieRecord.getData().getInsertValue(tableSchemaWithMetaFields, config.getProps())); } else { -writeRecord(hoodieRecord, hoodieRecord.getData().getInsertValue(tableSchema)); +writeRecord(hoodieRecord, hoodieRecord.getData().getInsertValue(tableSchema, config.getProps())); } insertRecordsWritten++; } diff --git a/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java b/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java index 86ccf67..76474fd 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java @@ -18,7 +18,6 @@ package org.apache.hudi.common.model; -import org.apache.hudi.common.config.HoodieConfig; import org.apache.hudi.common.util.Option; import org.apache.avro.Schema; @@ -56,7 +55,7 @@ public class DefaultHoodieRecordPayload extends OverwriteWithLatestAvroPayload { if (recordBytes.length == 0) { return Option.empty(); } -HoodieConfig hoodieConfig = new HoodieConfig(properties); + GenericRecord incomingRecord = bytesToAvro(recordBytes, schema); // Null check is needed here to support schema evolution. The record in storage may be from old schema where @@ -68,17 +67,27 @@ public class DefaultHoodieRecordPayload extends OverwriteWithLatestAvroPayload { /* * We reached a point where the value is disk is older than the incoming record. */ -eventTime = Option.ofNullable(getNestedFieldVal(incomingRecord, hoodieConfig -.getString(HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY), true)); +eventTime = updateEventTime(incomingRecord, properties); /* * Now check if the incoming record is a delete record. */ -if (isDeleteRecord(incomingRecord)) { +return isDeleteRecord(incomingRecord) ? Option.empty() : Option.of(incomingRecord); + } + + @Override + public Option getInsertValue(Schema schema, Properties properties) throws IOException { +if (recordBytes.length == 0) { return Option.empty(); -} else { - return Option.of(incomingRecord); } +GenericRecord incomingRecord = bytesToAvro(recordBytes, schema); +eventTime = updateEventTime(incomingRecord, properties); + +return isDeleteRecord(incomingRecord) ? Option.empty() : Option.of(incomingRecord); + } + + private static Option updateEventTime(GenericRecord record, Properties properties) { +return
[GitHub] [hudi] xushiyan merged pull request #3602: [HUDI-2398] Update event time for inserts
xushiyan merged pull request #3602: URL: https://github.com/apache/hudi/pull/3602 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-2415) Add more info log for flink streaming reader
[ https://issues.apache.org/jira/browse/HUDI-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-2415. -- Resolution: Fixed Fixed via master branch: 9d5c3e5cb92a4247bb1fc9a4a0e2eb3d2fbce1d6 > Add more info log for flink streaming reader > > > Key: HUDI-2415 > URL: https://issues.apache.org/jira/browse/HUDI-2415 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[hudi] branch master updated: [HUDI-2415] Add more info log for flink streaming reader (#3642)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 9d5c3e5 [HUDI-2415] Add more info log for flink streaming reader (#3642) 9d5c3e5 is described below commit 9d5c3e5cb92a4247bb1fc9a4a0e2eb3d2fbce1d6 Author: Danny Chan AuthorDate: Sun Sep 12 10:00:17 2021 +0800 [HUDI-2415] Add more info log for flink streaming reader (#3642) --- .../org/apache/hudi/source/StreamReadMonitoringFunction.java | 12 1 file changed, 12 insertions(+) diff --git a/hudi-flink/src/main/java/org/apache/hudi/source/StreamReadMonitoringFunction.java b/hudi-flink/src/main/java/org/apache/hudi/source/StreamReadMonitoringFunction.java index ec56903..c5610d2 100644 --- a/hudi-flink/src/main/java/org/apache/hudi/source/StreamReadMonitoringFunction.java +++ b/hudi-flink/src/main/java/org/apache/hudi/source/StreamReadMonitoringFunction.java @@ -248,6 +248,13 @@ public class StreamReadMonitoringFunction List activeMetadataList = instants.stream() .map(instant -> WriteProfiles.getCommitMetadata(tableName, path, instant, commitTimeline)).collect(Collectors.toList()); List archivedMetadataList = getArchivedMetadata(instantRange, commitTimeline, tableName); +if (archivedMetadataList.size() > 0) { + LOG.warn("" + + "\n" + + "-- caution: the reader has fall behind too much from the writer,\n" + + "-- tweak 'read.tasks' option to add parallelism of read tasks.\n" + + ""); +} List metadataList = archivedMetadataList.size() > 0 ? mergeList(activeMetadataList, archivedMetadataList) : activeMetadataList; @@ -288,6 +295,11 @@ public class StreamReadMonitoringFunction } // update the issues instant time this.issuedInstant = commitToIssue; +LOG.info("" ++ "\n" ++ "-- consumed to instant: {}\n" ++ "", +commitToIssue); } @Override
[GitHub] [hudi] danny0405 merged pull request #3642: [HUDI-2415] Add more info log for flink streaming reader
danny0405 merged pull request #3642: URL: https://github.com/apache/hudi/pull/3642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-2357) MERGE INTO doesn't work for tables created using CTAS
[ https://issues.apache.org/jira/browse/HUDI-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu resolved HUDI-2357. -- Resolution: Fixed > MERGE INTO doesn't work for tables created using CTAS > - > > Key: HUDI-2357 > URL: https://issues.apache.org/jira/browse/HUDI-2357 > Project: Apache Hudi > Issue Type: Sub-task > Components: Spark Integration >Reporter: Vinoth Govindarajan >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > MERGE INTO command doesn't select the correct primary key for tables created > using CTAS, whereas it works for tables created using CREATE TABLE command. > I guess we are hitting this issue because the key generator class is set to > SqlKeyGenerator for tables created using CTAS: > working use-case: > {code:java} > create table h5 (id bigint, name string, ts bigint) using hudi > options (type = "cow" , primaryKey="id" , preCombineField="ts" ); > merge into h5 as t0 > using ( > select 5 as s_id, 'vinoth' as s_name, current_timestamp() as s_ts > ) t1 > on t1.s_id = t0.id > when matched then update set * > when not matched then insert *; > {code} > hoodie.properties for working use-case: > {code:java} > ➜ analytics.db git:(apache_hudi_support) cat h5/.hoodie/hoodie.properties > #Properties saved on Wed Aug 25 04:10:33 UTC 2021 > #Wed Aug 25 04:10:33 UTC 2021 > hoodie.table.name=h5 > hoodie.table.recordkey.fields=id > hoodie.table.type=COPY_ON_WRITE > hoodie.table.precombine.field=ts > hoodie.table.partition.fields= > hoodie.archivelog.folder=archived > hoodie.table.create.schema={"type"\:"record","name"\:"topLevelRecord","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"id","type"\:["long","null"]},{"name"\:"name","type"\:["string","null"]},{"name"\:"ts","type"\:["long","null"]}]} > hoodie.timeline.layout.version=1 > hoodie.table.version=1{code} > > Whereas this doesn't work: > {code:java} > create table h4 using hudi options (type = "cow" , primaryKey="id" , > preCombineField="ts" ) as select 5 as id, cast(rand() as string) as name, > current_timestamp(); > merge into h3 as t0u sing (select '5' as s_id, 'vinoth' as s_name, > current_timestamp() as s_ts) t1 on t1.s_id = t0.id when matched then update > set * when not matched then insert *; > ERROR LOG > 544702 [main] ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver - > Failed in [merge into analytics.h3 as t0using ( select '5' as s_id, > 'vinoth' as s_name, current_timestamp() as s_ts) t1on t1.s_id = t0.idwhen > matched then update set *when not matched then insert > *]java.lang.IllegalArgumentException: Merge Key[id] is not Equal to the > defined primary key[] in table h3 at > org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425) > at > org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:147) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at > org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618) at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616) at > org.apache.spark.sql.Dataset.(Dataset.scala:229) at > org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at > org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at > org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at > org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) at >
[jira] [Closed] (HUDI-2357) MERGE INTO doesn't work for tables created using CTAS
[ https://issues.apache.org/jira/browse/HUDI-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-2357. > MERGE INTO doesn't work for tables created using CTAS > - > > Key: HUDI-2357 > URL: https://issues.apache.org/jira/browse/HUDI-2357 > Project: Apache Hudi > Issue Type: Sub-task > Components: Spark Integration >Reporter: Vinoth Govindarajan >Assignee: pengzhiwei >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > MERGE INTO command doesn't select the correct primary key for tables created > using CTAS, whereas it works for tables created using CREATE TABLE command. > I guess we are hitting this issue because the key generator class is set to > SqlKeyGenerator for tables created using CTAS: > working use-case: > {code:java} > create table h5 (id bigint, name string, ts bigint) using hudi > options (type = "cow" , primaryKey="id" , preCombineField="ts" ); > merge into h5 as t0 > using ( > select 5 as s_id, 'vinoth' as s_name, current_timestamp() as s_ts > ) t1 > on t1.s_id = t0.id > when matched then update set * > when not matched then insert *; > {code} > hoodie.properties for working use-case: > {code:java} > ➜ analytics.db git:(apache_hudi_support) cat h5/.hoodie/hoodie.properties > #Properties saved on Wed Aug 25 04:10:33 UTC 2021 > #Wed Aug 25 04:10:33 UTC 2021 > hoodie.table.name=h5 > hoodie.table.recordkey.fields=id > hoodie.table.type=COPY_ON_WRITE > hoodie.table.precombine.field=ts > hoodie.table.partition.fields= > hoodie.archivelog.folder=archived > hoodie.table.create.schema={"type"\:"record","name"\:"topLevelRecord","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"id","type"\:["long","null"]},{"name"\:"name","type"\:["string","null"]},{"name"\:"ts","type"\:["long","null"]}]} > hoodie.timeline.layout.version=1 > hoodie.table.version=1{code} > > Whereas this doesn't work: > {code:java} > create table h4 using hudi options (type = "cow" , primaryKey="id" , > preCombineField="ts" ) as select 5 as id, cast(rand() as string) as name, > current_timestamp(); > merge into h3 as t0u sing (select '5' as s_id, 'vinoth' as s_name, > current_timestamp() as s_ts) t1 on t1.s_id = t0.id when matched then update > set * when not matched then insert *; > ERROR LOG > 544702 [main] ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver - > Failed in [merge into analytics.h3 as t0using ( select '5' as s_id, > 'vinoth' as s_name, current_timestamp() as s_ts) t1on t1.s_id = t0.idwhen > matched then update set *when not matched then insert > *]java.lang.IllegalArgumentException: Merge Key[id] is not Equal to the > defined primary key[] in table h3 at > org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425) > at > org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:147) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at > org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618) at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616) at > org.apache.spark.sql.Dataset.(Dataset.scala:229) at > org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at > org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at > org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) at > org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at > org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) at > org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
[jira] [Commented] (HUDI-2387) Too many HEAD requests from Hudi to S3
[ https://issues.apache.org/jira/browse/HUDI-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413595#comment-17413595 ] Raymond Xu commented on HUDI-2387: -- [~uditme] would you raise this to AWS team please? > Too many HEAD requests from Hudi to S3 > --- > > Key: HUDI-2387 > URL: https://issues.apache.org/jira/browse/HUDI-2387 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core, Spark Integration >Affects Versions: 0.8.0 > Environment: AWS Glue with PySpark >Reporter: Sourav T >Priority: Major > > We are using Apache Hudi from AWS Glue (with PySpark runtime) to store data > on S3 bucket. We are observing a very high number of S3 HEAD requests > originating from what we believe from Hudi. > Many a time due to this high number of requests, S3 throws "Status Code: 503; > Error Code: SlowDown" causing data losses. > Is there any any out-of-box feature to debug this further to confirm which > Hudi feature causing this? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (HUDI-1702) TestHoodieMergeOnReadTable.init fails randomly on Travis CI
[ https://issues.apache.org/jira/browse/HUDI-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-1702. > TestHoodieMergeOnReadTable.init fails randomly on Travis CI > --- > > Key: HUDI-1702 > URL: https://issues.apache.org/jira/browse/HUDI-1702 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: Danny Chen >Priority: Major > Labels: sev:triage > Fix For: 0.10.0 > > > The test case fails randomly from time to time, which is annoying, take this > for a example: > https://travis-ci.com/github/apache/hudi/jobs/491671521 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HUDI-1702) TestHoodieMergeOnReadTable.init fails randomly on Travis CI
[ https://issues.apache.org/jira/browse/HUDI-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu resolved HUDI-1702. -- Fix Version/s: 0.10.0 Resolution: Fixed Fixed in https://issues.apache.org/jira/browse/HUDI-1989 > TestHoodieMergeOnReadTable.init fails randomly on Travis CI > --- > > Key: HUDI-1702 > URL: https://issues.apache.org/jira/browse/HUDI-1702 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Reporter: Danny Chen >Priority: Major > Labels: sev:triage > Fix For: 0.10.0 > > > The test case fails randomly from time to time, which is annoying, take this > for a example: > https://travis-ci.com/github/apache/hudi/jobs/491671521 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] fengjian428 closed pull request #3645: [HUDI-2413] fix Sql source's checkpoint
fengjian428 closed pull request #3645: URL: https://github.com/apache/hudi/pull/3645 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3645: [HUDI-2413] fix Sql source's checkpoint
hudi-bot edited a comment on pull request #3645: URL: https://github.com/apache/hudi/pull/3645#issuecomment-917441426 ## CI report: * be4aeaec24d12c0af19d8497dafcb6c60de0dfba Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2164) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2418) add HiveSchemaProvider
Jian Feng created HUDI-2418: --- Summary: add HiveSchemaProvider Key: HUDI-2418 URL: https://issues.apache.org/jira/browse/HUDI-2418 Project: Apache Hudi Issue Type: Improvement Components: DeltaStreamer Reporter: Jian Feng when using DeltaStreamer to migrate exist Hive table, it better to have a HiveSchemaProvider instead of avro schema file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot commented on pull request #3645: [HUDI-2413] fix Sql source's checkpoint
hudi-bot commented on pull request #3645: URL: https://github.com/apache/hudi/pull/3645#issuecomment-917441426 ## CI report: * be4aeaec24d12c0af19d8497dafcb6c60de0dfba UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2413) Sql source in delta streamer does not work
[ https://issues.apache.org/jira/browse/HUDI-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2413: - Labels: pull-request-available (was: ) > Sql source in delta streamer does not work > -- > > Key: HUDI-2413 > URL: https://issues.apache.org/jira/browse/HUDI-2413 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Jian Feng >Assignee: Jian Feng >Priority: Major > Labels: pull-request-available > > sql source return null checkpoint, in DeltaSync null checkpoint will be > judged as no new data,should return a empty string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] fengjian428 opened a new pull request #3645: [HUDI-2413] fix Sql source's checkpoint
fengjian428 opened a new pull request #3645: URL: https://github.com/apache/hudi/pull/3645 https://issues.apache.org/jira/browse/HUDI-2413 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3571: Caused by: java.lang.NoSuchFieldError: NULL_VALUE[SUPPORT]
nsivabalan commented on issue #3571: URL: https://github.com/apache/hudi/issues/3571#issuecomment-917433769 May I know whats the spark bundle you are using ? Hudi has 3 diff bundles for spark and scala version variants -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3582: [SUPPORT] Upsert to hudi table fails that got bootstrapped (w/ metadata only)
nsivabalan commented on issue #3582: URL: https://github.com/apache/hudi/issues/3582#issuecomment-917433235 @yanghua : Can you please help here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3554: [SUPPORT] Support Apache Spark 3.1
nsivabalan commented on issue #3554: URL: https://github.com/apache/hudi/issues/3554#issuecomment-917432982 @pengzhiwei2018 : feel free to close out this issue if we have a tracking jira. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3533: [SUPPORT]How to use MOR Table to Merge small file?
nsivabalan commented on issue #3533: URL: https://github.com/apache/hudi/issues/3533#issuecomment-917432581 yes, can you list files along w/ sizes. Based on the logs you have provided, likely we are interested in files w/ commit time 20210826162904 and 20210826162935. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient
hudi-bot edited a comment on pull request #3644: URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782 ## CI report: * 7ba55821ed9caedaaafa68afe9471d074d1a4cba Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2163) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3555: [SUPPORT] support show/drop partitions tablename sql
nsivabalan commented on issue #3555: URL: https://github.com/apache/hudi/issues/3555#issuecomment-917431583 @pengzhiwei2018 : Can you take this up. If you have a tracking jira, please link here and close this issue out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3431: [SUPPORT] Failed to upsert for commit time
nsivabalan commented on issue #3431: URL: https://github.com/apache/hudi/issues/3431#issuecomment-917431404 awesome, thnx for the update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3418: [SUPPORT] Hudi Upsert Very Slow/ Failed With No Space Left on Device
nsivabalan commented on issue #3418: URL: https://github.com/apache/hudi/issues/3418#issuecomment-917431256 If you wish to dedup with bulk_insert, we also need to set "hoodie.combine.before.insert" to true. Just to clarify, bulk_insert will not looking into any records in storage at all. so setting this config, will ensure incoming batch is deduped and written to hudi. In other words, if you do 2 bulk_inserts, one followed by another, each batch will write unique records to hudi, but if there are records overlapping between batch 1 and batch2, bulk_insert may not update it. hope that clarifies. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.
nsivabalan commented on issue #3605: URL: https://github.com/apache/hudi/issues/3605#issuecomment-917429823 Hey hi @Ambarish-Giri : For initial bulk loading of data into hudi, you can try "bulk_insert" operation. it is expected to be faster compared to regular operations. Ensure you set the right value for [avg record size config](https://hudi.apache.org/docs/configurations/#hoodiecopyonwriterecordsizeestimate) . for subsequent operations, hudi will infer the record size from older commits. But for first commit (bulk import/bulk_insert), hudi relies on this config to pack records to right sized files. Couple of questions before we dive into perf in detail: 1. may I know whats your upsert characteristics? Is it spread across all partitions, or just very few recent partitions. 2. Does your record key have any timestamp affinity or characteristics. If record keys are completely random, we can try SIMPLE index, since bloom may not be very effective for completely random keys. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient
hudi-bot edited a comment on pull request #3644: URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782 ## CI report: * 0e90c981ab046c7f8dfd88df25d5f3d16bb7552c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2162) * 7ba55821ed9caedaaafa68afe9471d074d1a4cba Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2163) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient
hudi-bot edited a comment on pull request #3644: URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782 ## CI report: * 0e90c981ab046c7f8dfd88df25d5f3d16bb7552c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2162) * 7ba55821ed9caedaaafa68afe9471d074d1a4cba UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient
hudi-bot edited a comment on pull request #3644: URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782 ## CI report: * 0e90c981ab046c7f8dfd88df25d5f3d16bb7552c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2162) * 7ba55821ed9caedaaafa68afe9471d074d1a4cba UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient
hudi-bot edited a comment on pull request #3644: URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782 ## CI report: * 0e90c981ab046c7f8dfd88df25d5f3d16bb7552c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2162) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] SteNicholas commented on a change in pull request #3633: [HUDI-2410] Fix getDefaultBootstrapIndexClass logical error
SteNicholas commented on a change in pull request #3633: URL: https://github.com/apache/hudi/pull/3633#discussion_r706619802 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -136,10 +136,10 @@ .defaultValue("archived") .withDocumentation("path under the meta folder, to store archived timeline instants at."); - public static final ConfigProperty BOOTSTRAP_INDEX_ENABLE = ConfigProperty + public static final ConfigProperty BOOTSTRAP_INDEX_ENABLE = ConfigProperty .key("hoodie.bootstrap.index.enable") - .noDefaultValue() - .withDocumentation("Whether or not, this is a bootstrapped table, with bootstrap base data and an mapping index defined."); + .defaultValue(false) Review comment: IMO, the default value of `hoodie.bootstrap.index.enable` should be true. ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -298,8 +298,9 @@ public String getBootstrapIndexClass() { } public static String getDefaultBootstrapIndexClass(Properties props) { +HoodieConfig hoodieConfig = new HoodieConfig(props); String defaultClass = BOOTSTRAP_INDEX_CLASS_NAME.defaultValue(); -if ("false".equalsIgnoreCase(props.getProperty(BOOTSTRAP_INDEX_ENABLE.key( { +if (!hoodieConfig.getBooleanOrDefault(BOOTSTRAP_INDEX_ENABLE)) { Review comment: The option `hoodie.bootstrap.index.class` could not have the default value. If the default value of the `hoodie.bootstrap.index.class` is `HFileBootstrapIndex.class.getName()`, the method `getDefaultBootstrapIndexClass` should be renamed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient
hudi-bot commented on pull request #3644: URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782 ## CI report: * 0e90c981ab046c7f8dfd88df25d5f3d16bb7552c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2417) Add support allowDuplicateInserts in HoodieJavaClient
[ https://issues.apache.org/jira/browse/HUDI-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2417: - Labels: pull-request-available (was: ) > Add support allowDuplicateInserts in HoodieJavaClient > -- > > Key: HUDI-2417 > URL: https://issues.apache.org/jira/browse/HUDI-2417 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: 董可伦 >Assignee: 董可伦 >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > Add support allowDuplicateInserts in HoodieJavaClient -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] dongkelun opened a new pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient
dongkelun opened a new pull request #3644: URL: https://github.com/apache/hudi/pull/3644 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *Add support allowDuplicateInserts in HoodieJavaClient* ## Brief change log *(for example:)* - *Add support allowDuplicateInserts in HoodieJavaClient* ## Verify this pull request This change added tests and can be verified as follows: - *Added testHoodieConcatHandle in TestJavaCopyOnWriteActionExecutor* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2417) Add support allowDuplicateInserts in HoodieJavaClient
董可伦 created HUDI-2417: - Summary: Add support allowDuplicateInserts in HoodieJavaClient Key: HUDI-2417 URL: https://issues.apache.org/jira/browse/HUDI-2417 Project: Apache Hudi Issue Type: Improvement Components: Writer Core Reporter: 董可伦 Assignee: 董可伦 Fix For: 0.10.0 Add support allowDuplicateInserts in HoodieJavaClient -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2416) Move FAQs to website
[ https://issues.apache.org/jira/browse/HUDI-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2416: - Labels: pull-request-available (was: ) > Move FAQs to website > > > Key: HUDI-2416 > URL: https://issues.apache.org/jira/browse/HUDI-2416 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs, Usability >Reporter: Pratyaksh Sharma >Assignee: Pratyaksh Sharma >Priority: Major > Labels: pull-request-available > > We intend to move all the docs from cWiki to website. FAQs is a good starting > point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] pratyakshsharma commented on a change in pull request #3496: [HUDI-2416] Move content from cwiki to website (FAQ movement)
pratyakshsharma commented on a change in pull request #3496: URL: https://github.com/apache/hudi/pull/3496#discussion_r706600974 ## File path: website/learn/faq.md ## @@ -0,0 +1,440 @@ +--- +title: FAQs +keywords: [hudi, writing, reading] +last_modified_at: 2021-08-18T15:59:57-04:00 +--- +# FAQs + +## General + +### When is Hudi useful for me or my organization? + +If you are looking to quickly ingest data onto HDFS or cloud storage, Hudi can provide you tools to [help](https://hudi.apache.org/docs/writing_data/). Also, if you have ETL/hive/spark jobs which are slow/taking up a lot of resources, Hudi can potentially help by providing an incremental approach to reading and writing data. + +As an organization, Hudi can help you build an [efficient data lake](https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM/edit#slide=id.p), solving some of the most complex, low-level storage management problems, while putting data into hands of your data analysts, engineers and scientists much quicker. + +### What are some non-goals for Hudi? + +Hudi is not designed for any OLTP use-cases, where typically you are using existing NoSQL/RDBMS data stores. Hudi cannot replace your in-memory analytical database (at-least not yet!). Hudi support near-real time ingestion in the order of few minutes, trading off latency for efficient batching. If you truly desirable sub-minute processing delays, then stick with your favorite stream processing solution. + +### What is incremental processing? Why does Hudi docs/talks keep talking about it? + +Incremental processing was first introduced by Vinoth Chandar, in the O'reilly [blog](https://www.oreilly.com/content/ubers-case-for-incremental-processing-on-hadoop/), that set off most of this effort. In purely technical terms, incremental processing merely refers to writing mini-batch programs in streaming processing style. Typical batch jobs consume **all input** and recompute **all output**, every few hours. Typical stream processing jobs consume some **new input** and recompute **new/changes to output**, continuously/every few seconds. While recomputing all output in batch fashion can be simpler, it's wasteful and resource expensive. Hudi brings ability to author the same batch pipelines in streaming fashion, run every few minutes. + +While we can merely refer to this as stream processing, we call it *incremental processing*, to distinguish from purely stream processing pipelines built using Apache Flink, Apache Apex or Apache Kafka Streams. + +### What is the difference between copy-on-write (COW) vs merge-on-read (MOR) storage types? + +**Copy On Write** - This storage type enables clients to ingest data on columnar file formats, currently parquet. Any new data that is written to the Hudi dataset using COW storage type, will write new parquet files. Updating an existing set of rows will result in a rewrite of the entire parquet files that collectively contain the affected rows being updated. Hence, all writes to such datasets are limited by parquet writing performance, the larger the parquet file, the higher is the time taken to ingest the data. + +**Merge On Read** - This storage type enables clients to ingest data quickly onto row based data format such as avro. Any new data that is written to the Hudi dataset using MOR table type, will write new log/delta files that internally store the data as avro encoded bytes. A compaction process (configured as inline or asynchronous) will convert log file format to columnar file format (parquet). Two different InputFormats expose 2 different views of this data, Read Optimized view exposes columnar parquet reading performance while Realtime View exposes columnar and/or log reading performance respectively. Updating an existing set of rows will result in either a) a companion log/delta file for an existing base parquet file generated from a previous compaction or b) an update written to a log/delta file in case no compaction ever happened for it. Hence, all writes to such datasets are limited by avro/log file writing performance, much faster than parquet. Although, there is a higher co st to pay to read log/delta files vs columnar (parquet) files. + +More details can be found [here](https://hudi.apache.org/docs/concepts/) and also [Design And Architecture](https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture). + +### How do I choose a storage type for my workload? + +A key goal of Hudi is to provide **upsert functionality** that is orders of magnitude faster than rewriting entire tables or partitions. + +Choose Copy-on-write storage if : + + - You are looking for a simple alternative, that replaces your existing parquet tables without any need for real-time data. + - Your current job is rewriting entire table/partition to deal with updates, while only a few files actually change in each partition. + - You are happy
[jira] [Updated] (HUDI-2416) Move FAQs to website
[ https://issues.apache.org/jira/browse/HUDI-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratyaksh Sharma updated HUDI-2416: --- Status: Patch Available (was: In Progress) > Move FAQs to website > > > Key: HUDI-2416 > URL: https://issues.apache.org/jira/browse/HUDI-2416 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs, Usability >Reporter: Pratyaksh Sharma >Assignee: Pratyaksh Sharma >Priority: Major > > We intend to move all the docs from cWiki to website. FAQs is a good starting > point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HUDI-2416) Move FAQs to website
[ https://issues.apache.org/jira/browse/HUDI-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratyaksh Sharma updated HUDI-2416: --- Status: In Progress (was: Open) > Move FAQs to website > > > Key: HUDI-2416 > URL: https://issues.apache.org/jira/browse/HUDI-2416 > Project: Apache Hudi > Issue Type: Improvement > Components: Docs, Usability >Reporter: Pratyaksh Sharma >Assignee: Pratyaksh Sharma >Priority: Major > > We intend to move all the docs from cWiki to website. FAQs is a good starting > point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HUDI-2416) Move FAQs to website
Pratyaksh Sharma created HUDI-2416: -- Summary: Move FAQs to website Key: HUDI-2416 URL: https://issues.apache.org/jira/browse/HUDI-2416 Project: Apache Hudi Issue Type: Improvement Components: Docs, Usability Reporter: Pratyaksh Sharma Assignee: Pratyaksh Sharma We intend to move all the docs from cWiki to website. FAQs is a good starting point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] pratyakshsharma commented on pull request #3496: Move content from cwiki to website (FAQ movement)
pratyakshsharma commented on pull request #3496: URL: https://github.com/apache/hudi/pull/3496#issuecomment-917389381 @vinothchandar Please take a look. Fixed all the broken links now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pratyakshsharma commented on pull request #3416: [HUDI-2362] Add external config file support
pratyakshsharma commented on pull request #3416: URL: https://github.com/apache/hudi/pull/3416#issuecomment-917378727 Got it, please resolve the conflicts and we can start review then. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] pratyakshsharma commented on a change in pull request #3608: [HUDI-2397] Add `--enable-sync` parameter
pratyakshsharma commented on a change in pull request #3608: URL: https://github.com/apache/hudi/pull/3608#discussion_r706587965 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java ## @@ -187,7 +188,7 @@ public void testMultiTableExecutionWithParquetSource() throws IOException { // add only common props. later we can add per table props String parquetPropsFile = populateCommonPropsAndWriteToFile(); -HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(parquetPropsFile, dfsBasePath + "/config", ParquetDFSSource.class.getName(), false, +HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(parquetPropsFile, dfsBasePath + "/config", ParquetDFSSource.class.getName(), false, true, Review comment: ditto ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java ## @@ -218,7 +219,7 @@ public void testMultiTableExecutionWithParquetSource() throws IOException { @Test public void testTableLevelProperties() throws IOException { -HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", TestDataSource.class.getName(), false); +HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", TestDataSource.class.getName(), false, true); Review comment: ditto ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java ## @@ -138,7 +139,7 @@ public void testMultiTableExecutionWithKafkaSource() throws IOException { testUtils.sendMessages(topicName1, Helpers.jsonifyRecords(dataGenerator.generateInsertsAsPerSchema("000", 5, HoodieTestDataGenerator.TRIP_SCHEMA))); testUtils.sendMessages(topicName2, Helpers.jsonifyRecords(dataGenerator.generateInsertsAsPerSchema("000", 10, HoodieTestDataGenerator.SHORT_TRIP_SCHEMA))); -HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", JsonKafkaSource.class.getName(), false); +HoodieMultiTableDeltaStreamer.Config cfg = TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", JsonKafkaSource.class.getName(), false, true); Review comment: Let us keep enableMetaSync as false where enableHiveSync is also false? Otherwise it might lead to confusion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Ambarish-Giri edited a comment on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.
Ambarish-Giri edited a comment on issue #3605: URL: https://github.com/apache/hudi/issues/3605#issuecomment-917361121 Hi @danny0405 can you explain a bit more on "if the BloomFilter got false positive"? In my case the record key is concat(uuid4,segmentId). SegmentId is an integer value i.e. it can be same for multiple records and uuid4 is standard unique random value ( note: "-" are being removed from the uuid4 values though), but a combination of both identifies a record uniquely and partition key is again segmentId as it has low cardinality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-2413) Sql source in delta streamer does not work
[ https://issues.apache.org/jira/browse/HUDI-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-2413: --- Assignee: Jian Feng > Sql source in delta streamer does not work > -- > > Key: HUDI-2413 > URL: https://issues.apache.org/jira/browse/HUDI-2413 > Project: Apache Hudi > Issue Type: Bug > Components: DeltaStreamer >Reporter: Jian Feng >Assignee: Jian Feng >Priority: Major > > sql source return null checkpoint, in DeltaSync null checkpoint will be > judged as no new data,should return a empty string -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] leesf merged pull request #3643: [MINOR] Fix typo, 'requried' corrected to 'required'
leesf merged pull request #3643: URL: https://github.com/apache/hudi/pull/3643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [MINOR] Fix typo, 'requried' corrected to 'required' (#3643)
This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 6228b17 [MINOR] Fix typo, 'requried' corrected to 'required' (#3643) 6228b17 is described below commit 6228b17a3ddb4c336b30e5b8c650e003e38b5e3e Author: 董可伦 AuthorDate: Sat Sep 11 15:46:24 2021 +0800 [MINOR] Fix typo, 'requried' corrected to 'required' (#3643) --- .../src/main/java/org/apache/hudi/io/HoodieMergeHandle.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java index 3e20141..b01d62f 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java @@ -77,7 +77,7 @@ import java.util.Set; * Existing data: * rec1_1, rec2_1, rec3_1, rec4_1 * - * For every existing record, merge w/ incoming if requried and write to storage. + * For every existing record, merge w/ incoming if required and write to storage. *=> rec1_1 and rec1_2 is merged to write rec1_2 to storage *=> rec2_1 is written as is *=> rec3_1 is written as is
[hudi] branch master updated: [MINOR] fix typo (#3640)
This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new dbcf60f [MINOR] fix typo (#3640) dbcf60f is described below commit dbcf60f370e93ab490cf82e677387a07ea743cda Author: 董可伦 AuthorDate: Sat Sep 11 15:45:49 2021 +0800 [MINOR] fix typo (#3640) --- .../org/apache/hudi/table/action/commit/JavaUpsertPartitioner.java | 6 +++--- .../java/org/apache/hudi/table/action/commit/UpsertPartitioner.java | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/JavaUpsertPartitioner.java b/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/JavaUpsertPartitioner.java index 6b5cb29..33f59f4 100644 --- a/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/JavaUpsertPartitioner.java +++ b/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/JavaUpsertPartitioner.java @@ -189,13 +189,13 @@ public class JavaUpsertPartitioner> implements // Go over all such buckets, and assign weights as per amount of incoming inserts. List insertBuckets = new ArrayList<>(); -double curentCumulativeWeight = 0; +double currentCumulativeWeight = 0; for (int i = 0; i < bucketNumbers.size(); i++) { InsertBucket bkt = new InsertBucket(); bkt.bucketNumber = bucketNumbers.get(i); bkt.weight = (1.0 * recordsPerBucket.get(i)) / pStat.getNumInserts(); - curentCumulativeWeight += bkt.weight; - insertBuckets.add(new InsertBucketCumulativeWeightPair(bkt, curentCumulativeWeight)); + currentCumulativeWeight += bkt.weight; + insertBuckets.add(new InsertBucketCumulativeWeightPair(bkt, currentCumulativeWeight)); } LOG.info("Total insert buckets for partition path " + partitionPath + " => " + insertBuckets); partitionPathToInsertBucketInfos.put(partitionPath, insertBuckets); diff --git a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java index 3c0a511..35a8bdd 100644 --- a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java +++ b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java @@ -232,13 +232,13 @@ public class UpsertPartitioner> extends Partiti // Go over all such buckets, and assign weights as per amount of incoming inserts. List insertBuckets = new ArrayList<>(); -double curentCumulativeWeight = 0; +double currentCumulativeWeight = 0; for (int i = 0; i < bucketNumbers.size(); i++) { InsertBucket bkt = new InsertBucket(); bkt.bucketNumber = bucketNumbers.get(i); bkt.weight = (1.0 * recordsPerBucket.get(i)) / pStat.getNumInserts(); - curentCumulativeWeight += bkt.weight; - insertBuckets.add(new InsertBucketCumulativeWeightPair(bkt, curentCumulativeWeight)); + currentCumulativeWeight += bkt.weight; + insertBuckets.add(new InsertBucketCumulativeWeightPair(bkt, currentCumulativeWeight)); } LOG.info("Total insert buckets for partition path " + partitionPath + " => " + insertBuckets); partitionPathToInsertBucketInfos.put(partitionPath, insertBuckets);
[GitHub] [hudi] leesf merged pull request #3640: [MINOR] Fix typo
leesf merged pull request #3640: URL: https://github.com/apache/hudi/pull/3640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Ambarish-Giri commented on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.
Ambarish-Giri commented on issue #3605: URL: https://github.com/apache/hudi/issues/3605#issuecomment-917361121 Hi @danny0405 can you explain a bit more on "if the BloomFilter got false positive"? In my case the record key is concat(uuid4,segmentId). SegmentId is an integer value i.e. it can be same for multiple records and uuid4 is standard unique random value, but a combination of both identifies a record uniquely and partition key is again segmentId as it has low cardinality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3643: [MINOR] Fix typo, 'requried' corrected to 'required'
hudi-bot edited a comment on pull request #3643: URL: https://github.com/apache/hudi/pull/3643#issuecomment-917351059 ## CI report: * 2151667bdd2cc7fafd47462a5f7e13726b4edbf9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2160) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3642: [HUDI-2415] Add more info log for flink streaming reader
hudi-bot edited a comment on pull request #3642: URL: https://github.com/apache/hudi/pull/3642#issuecomment-917342840 ## CI report: * e6d28ea164871213cc28b922160bad95d13a94de Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2159) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] sbernauer closed pull request #1844: [WIP] Added test to reproduce a problem with schema evolution
sbernauer closed pull request #1844: URL: https://github.com/apache/hudi/pull/1844 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] sbernauer commented on pull request #1844: [WIP] Added test to reproduce a problem with schema evolution
sbernauer commented on pull request #1844: URL: https://github.com/apache/hudi/pull/1844#issuecomment-917354441 Thanks @codope for your work! The most relevant part of the tests got included with #2927 Thanks for adding the other part! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-2414) enable Hot and cold data separate when ingest data
[ https://issues.apache.org/jira/browse/HUDI-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Feng reassigned HUDI-2414: --- Assignee: Jian Feng > enable Hot and cold data separate when ingest data > -- > > Key: HUDI-2414 > URL: https://issues.apache.org/jira/browse/HUDI-2414 > Project: Apache Hudi > Issue Type: Improvement > Components: Writer Core >Reporter: Jian Feng >Assignee: Jian Feng >Priority: Major > > when using Hudi to ingest e-commercial company's item data,there are massive > update data into old partitions,if one record need update, then the whole > file it belongs need rewrite, that result in every commit nearly rewrite the > whole table. > I'm thinking if Hudi can provide a hot and cold data separate tool, work with > specific column(such as create time and update time) to distinguish hot data > and cold data, then rebuild table to separate them into different file > groups, after recreate table, the performance will be much better -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hudi] hudi-bot edited a comment on pull request #3643: [MINOR] Fix typo, 'requried' corrected to 'required'
hudi-bot edited a comment on pull request #3643: URL: https://github.com/apache/hudi/pull/3643#issuecomment-917351059 ## CI report: * 2151667bdd2cc7fafd47462a5f7e13726b4edbf9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2160) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3643: [MINOR] Fix typo, 'requried' corrected to 'required'
hudi-bot commented on pull request #3643: URL: https://github.com/apache/hudi/pull/3643#issuecomment-917351059 ## CI report: * 2151667bdd2cc7fafd47462a5f7e13726b4edbf9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun opened a new pull request #3643: [MINOR] Fix typo, 'requried' corrected to 'required'
dongkelun opened a new pull request #3643: URL: https://github.com/apache/hudi/pull/3643 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *Fix typo, 'requried' corrected to 'required'* ## Brief change log *(for example:)* - *Fix typo, 'requried' corrected to 'required'* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot edited a comment on pull request #3642: [HUDI-2415] Add more info log for flink streaming reader
hudi-bot edited a comment on pull request #3642: URL: https://github.com/apache/hudi/pull/3642#issuecomment-917342840 ## CI report: * f16a50a6de42940ea2794b06ef079472dd480875 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2157) * e6d28ea164871213cc28b922160bad95d13a94de Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2159) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis` re-run the last Travis build - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org