date:20210911

[GitHub] [hudi] xushiyan commented on issue #3554: [SUPPORT] Support Apache Spark 3.1

2021-09-11 Thread GitBox



xushiyan commented on issue #3554:
URL: https://github.com/apache/hudi/issues/3554#issuecomment-917554170


   Moved to JIRA
   https://issues.apache.org/jira/browse/HUDI-1869


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan closed issue #3554: [SUPPORT] Support Apache Spark 3.1

2021-09-11 Thread GitBox



xushiyan closed issue #3554:
URL: https://github.com/apache/hudi/issues/3554


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (HUDI-2190) Unnecessary exception catch in SparkBulkInsertPreppedCommitActionExecutor#execute

2021-09-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2190.

Resolution: Won't Do

> Unnecessary exception catch in 
> SparkBulkInsertPreppedCommitActionExecutor#execute
> -
>
> Key: HUDI-2190
> URL: https://issues.apache.org/jira/browse/HUDI-2190
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: zhangminglei
>Priority: Major
>  Labels: pull-request-available
>
> SparkBulkInsertPreppedCommitActionExecutor#execute has a try catch and etc in 
> some others class, but it is unnecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-09-11 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413648#comment-17413648
 ] 

Raymond Xu commented on HUDI-864:
-

[~vinoth]

the parquet version is upgraded in spark 3.2.0-rc2

[https://github.com/apache/spark/blob/03f5d23e96374670c7ea3525f871393432f0e538/pom.xml#L139]

 

The issue may stay with Spark 2 but go away when we force upgrade to build Hudi 
with Spark 3.2

> parquet schema conflict: optional binary  (UTF8) is not a group
> ---
>
> Key: HUDI-864
> URL: https://issues.apache.org/jira/browse/HUDI-864
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Affects Versions: 0.5.2
>Reporter: Roland Johann
>Priority: Major
>  Labels: sev:high, user-support-issues
>
> When dealing with struct types like this
> {code:json}
> {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryResults",
>   "type": {
> "type": "array",
> "elementType": {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryId",
>   "type": "string",
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> },
> "containsNull": true
>   },
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> }
> {code}
> The second ingest batch throws that exception:
> {code}
> ERROR [Executor task launch worker for task 15] 
> commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error 
> upserting bucketType UPDATE for partition :0
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100)
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76)
>   at 
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
>

[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-09-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-864:

Component/s: Common Core

> parquet schema conflict: optional binary  (UTF8) is not a group
> ---
>
> Key: HUDI-864
> URL: https://issues.apache.org/jira/browse/HUDI-864
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Affects Versions: 0.5.2
>Reporter: Roland Johann
>Priority: Major
>  Labels: sev:high, user-support-issues
>
> When dealing with struct types like this
> {code:json}
> {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryResults",
>   "type": {
> "type": "array",
> "elementType": {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryId",
>   "type": "string",
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> },
> "containsNull": true
>   },
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> }
> {code}
> The second ingest batch throws that exception:
> {code}
> ERROR [Executor task launch worker for task 15] 
> commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error 
> upserting bucketType UPDATE for partition :0
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100)
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76)
>   at 
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException: 
>

[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-09-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-864:

Labels: sev:high user-support-issues  (was: sevv:high user-support-issues)

> parquet schema conflict: optional binary  (UTF8) is not a group
> ---
>
> Key: HUDI-864
> URL: https://issues.apache.org/jira/browse/HUDI-864
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.5.2
>Reporter: Roland Johann
>Priority: Major
>  Labels: sev:high, user-support-issues
>
> When dealing with struct types like this
> {code:json}
> {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryResults",
>   "type": {
> "type": "array",
> "elementType": {
>   "type": "struct",
>   "fields": [
> {
>   "name": "categoryId",
>   "type": "string",
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> },
> "containsNull": true
>   },
>   "nullable": true,
>   "metadata": {}
> }
>   ]
> }
> {code}
> The second ingest batch throws that exception:
> {code}
> ERROR [Executor task launch worker for task 15] 
> commit.BaseCommitActionExecutor (BaseCommitActionExecutor.java:264) - Error 
> upserting bucketType UPDATE for partition :0
> org.apache.hudi.exception.HoodieException: 
> org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdateInternal(CommitActionExecutor.java:100)
>   at 
> org.apache.hudi.table.action.commit.CommitActionExecutor.handleUpdate(CommitActionExecutor.java:76)
>   at 
> org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:73)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:258)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleInsertPartition(BaseCommitActionExecutor.java:271)
>   at 
> org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:104)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
>   at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
>   at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
>   at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
>   at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieException:

[GitHub] [hudi] hudi-bot edited a comment on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #2210:
URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641


   
   ## CI report:
   
   * 67dabdd51934a7141a299114de2b836b1f016fd5 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2166)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] QingFengZhou commented on issue #143: Tracking ticket for folks to be added to slack group

2021-09-11 Thread GitBox



QingFengZhou commented on issue #143:
URL: https://github.com/apache/hudi/issues/143#issuecomment-917547988


   Please add me to slack group
   Email:  zhouqf2...@163.com
   Thanks!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #2210:
URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641


   
   ## CI report:
   
   * a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1668)
 
   * 67dabdd51934a7141a299114de2b836b1f016fd5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2166)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #2210:
URL: https://github.com/apache/hudi/pull/2210#issuecomment-862028641


   
   ## CI report:
   
   * a174c4ed2b4c13a032a38afdb0a21b58a7b6cf25 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1668)
 
   * 67dabdd51934a7141a299114de2b836b1f016fd5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codecov-commenter removed a comment on pull request #2210: [HUDI-1348] Provide option to clean up DFS sources

2021-09-11 Thread GitBox



codecov-commenter removed a comment on pull request #2210:
URL: https://github.com/apache/hudi/pull/2210#issuecomment-862054362


   # 
[Codecov](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=h1_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 Report
   > Merging 
[#2210](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (b845e34) into 
[master](https://codecov.io/gh/apache/hudi/commit/673d62f3c3ab07abb3fcd319607e657339bc0682?el=desc_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 (673d62f) will **increase** coverage by `44.07%`.
   > The diff coverage is `n/a`.
   
   [![Impacted file tree 
graph](https://codecov.io/gh/apache/hudi/pull/2210/graphs/tree.svg?width=650=150=pr=VTTXabwbs2_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
   
   ```diff
   @@  Coverage Diff  @@
   ## master#2210   +/-   ##
   =
   + Coverage  8.43%   52.51%   +44.07% 
   - Complexity   62 3664 +3602 
   =
 Files70  474  +404 
 Lines  288023997+21117 
 Branches359 2741 +2382 
   =
   + Hits24312601+12358 
   - Misses 261610137 +7521 
   - Partials 21 1259 +1238 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hudicli | `39.95% <ø> (?)` | |
   | hudiclient | `∅ <ø> (∅)` | |
   | hudicommon | `48.20% <ø> (?)` | |
   | hudiflink | `60.73% <ø> (?)` | |
   | hudihadoopmr | `51.34% <ø> (?)` | |
   | hudisparkdatasource | `66.47% <ø> (?)` | |
   | hudisync | `46.79% <ø> (+40.00%)` | :arrow_up: |
   | huditimelineservice | `64.36% <ø> (?)` | |
   | hudiutilities | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Impacted 
Files](https://codecov.io/gh/apache/hudi/pull/2210?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ache/hudi/hive/HiveMetastoreBasedLockProvider.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS1zeW5jL2h1ZGktaGl2ZS1zeW5jL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9odWRpL2hpdmUvSGl2ZU1ldGFzdG9yZUJhc2VkTG9ja1Byb3ZpZGVyLmphdmE=)
 | `0.00% <0.00%> (-60.22%)` | :arrow_down: |
   | 
[...va/org/apache/hudi/utilities/schema/SchemaSet.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFTZXQuamF2YQ==)
 | | |
   | 
[...va/org/apache/hudi/utilities/IdentitySplitter.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL0lkZW50aXR5U3BsaXR0ZXIuamF2YQ==)
 | | |
   | 
[...a/org/apache/hudi/utilities/sources/SqlSource.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NvdXJjZXMvU3FsU291cmNlLmphdmE=)
 | | |
   | 
[...s/exception/HoodieIncrementalPullSQLException.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2V4Y2VwdGlvbi9Ib29kaWVJbmNyZW1lbnRhbFB1bGxTUUxFeGNlcHRpb24uamF2YQ==)
 | | |
   | 
[...che/hudi/utilities/schema/SchemaPostProcessor.java](https://codecov.io/gh/apache/hudi/pull/2210/diff?src=pr=tree_medium=referral_source=github_content=comment_campaign=pr+comments_term=The+Apache+Software+Foundation#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9TY2hlbWFQb3N0UHJvY2Vzc29yLmphdmE=)
 | | |
   |

[jira] [Resolved] (HUDI-2398) Event Time not getting updated for inserts

2021-09-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu resolved HUDI-2398.
--
Resolution: Fixed

> Event Time not getting updated for inserts
> --
>
> Key: HUDI-2398
> URL: https://issues.apache.org/jira/browse/HUDI-2398
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core, metrics
>Reporter: Ankush Kanungo
>Assignee: Ankush Kanungo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When using DefaultHoodieRecordPayload class, event time (for latency 
> calculations) is not being updated for inserts and stays null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-2398) Event Time not getting updated for inserts

2021-09-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2398.


> Event Time not getting updated for inserts
> --
>
> Key: HUDI-2398
> URL: https://issues.apache.org/jira/browse/HUDI-2398
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core, metrics
>Reporter: Ankush Kanungo
>Assignee: Ankush Kanungo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When using DefaultHoodieRecordPayload class, event time (for latency 
> calculations) is not being updated for inserts and stays null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[hudi] branch master updated: [HUDI-2398] Collect event time for inserts in DefaultHoodieRecordPayload (#3602)

2021-09-11 Thread xushiyan

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 4f991ee  [HUDI-2398] Collect event time for inserts in 
DefaultHoodieRecordPayload (#3602)
4f991ee is described below

commit 4f991ee3525c6225c7bf3b46e272f7d5b919196e
Author: Ankush Kanungo <40214578+akanun...@users.noreply.github.com>
AuthorDate: Sat Sep 11 20:27:40 2021 -0700

[HUDI-2398] Collect event time for inserts in DefaultHoodieRecordPayload 
(#3602)
---
 .../apache/hudi/io/HoodieSortedMergeHandle.java|  8 
 .../common/model/DefaultHoodieRecordPayload.java   | 23 ++---
 .../model/TestDefaultHoodieRecordPayload.java  | 24 ++
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java
index 763178d..606e63a 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieSortedMergeHandle.java
@@ -90,9 +90,9 @@ public class HoodieSortedMergeHandle ext
   }
   try {
 if (useWriterSchema) {
-  writeRecord(hoodieRecord, 
hoodieRecord.getData().getInsertValue(tableSchemaWithMetaFields));
+  writeRecord(hoodieRecord, 
hoodieRecord.getData().getInsertValue(tableSchemaWithMetaFields, 
config.getProps()));
 } else {
-  writeRecord(hoodieRecord, 
hoodieRecord.getData().getInsertValue(tableSchema));
+  writeRecord(hoodieRecord, 
hoodieRecord.getData().getInsertValue(tableSchema, config.getProps()));
 }
 insertRecordsWritten++;
 writtenRecordKeys.add(keyToPreWrite);
@@ -112,9 +112,9 @@ public class HoodieSortedMergeHandle ext
 HoodieRecord hoodieRecord = keyToNewRecords.get(key);
 if (!writtenRecordKeys.contains(hoodieRecord.getRecordKey())) {
   if (useWriterSchema) {
-writeRecord(hoodieRecord, 
hoodieRecord.getData().getInsertValue(tableSchemaWithMetaFields));
+writeRecord(hoodieRecord, 
hoodieRecord.getData().getInsertValue(tableSchemaWithMetaFields, 
config.getProps()));
   } else {
-writeRecord(hoodieRecord, 
hoodieRecord.getData().getInsertValue(tableSchema));
+writeRecord(hoodieRecord, 
hoodieRecord.getData().getInsertValue(tableSchema, config.getProps()));
   }
   insertRecordsWritten++;
 }
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java
index 86ccf67..76474fd 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/model/DefaultHoodieRecordPayload.java
@@ -18,7 +18,6 @@
 
 package org.apache.hudi.common.model;
 
-import org.apache.hudi.common.config.HoodieConfig;
 import org.apache.hudi.common.util.Option;
 
 import org.apache.avro.Schema;
@@ -56,7 +55,7 @@ public class DefaultHoodieRecordPayload extends 
OverwriteWithLatestAvroPayload {
 if (recordBytes.length == 0) {
   return Option.empty();
 }
-HoodieConfig hoodieConfig = new HoodieConfig(properties);
+
 GenericRecord incomingRecord = bytesToAvro(recordBytes, schema);
 
 // Null check is needed here to support schema evolution. The record in 
storage may be from old schema where
@@ -68,17 +67,27 @@ public class DefaultHoodieRecordPayload extends 
OverwriteWithLatestAvroPayload {
 /*
  * We reached a point where the value is disk is older than the incoming 
record.
  */
-eventTime = Option.ofNullable(getNestedFieldVal(incomingRecord, 
hoodieConfig
-.getString(HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY), 
true));
+eventTime = updateEventTime(incomingRecord, properties);
 
 /*
  * Now check if the incoming record is a delete record.
  */
-if (isDeleteRecord(incomingRecord)) {
+return isDeleteRecord(incomingRecord) ? Option.empty() : 
Option.of(incomingRecord);
+  }
+
+  @Override
+  public Option getInsertValue(Schema schema, Properties 
properties) throws IOException {
+if (recordBytes.length == 0) {
   return Option.empty();
-} else {
-  return Option.of(incomingRecord);
 }
+GenericRecord incomingRecord = bytesToAvro(recordBytes, schema);
+eventTime = updateEventTime(incomingRecord, properties);
+
+return isDeleteRecord(incomingRecord) ? Option.empty() : 
Option.of(incomingRecord);
+  }
+
+  private static Option updateEventTime(GenericRecord record, 
Properties properties) {
+return

[GitHub] [hudi] xushiyan merged pull request #3602: [HUDI-2398] Update event time for inserts

2021-09-11 Thread GitBox



xushiyan merged pull request #3602:
URL: https://github.com/apache/hudi/pull/3602


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Resolved] (HUDI-2415) Add more info log for flink streaming reader

2021-09-11 Thread Danny Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2415.
--
Resolution: Fixed

Fixed via master branch: 9d5c3e5cb92a4247bb1fc9a4a0e2eb3d2fbce1d6

> Add more info log for flink streaming reader
> 
>
> Key: HUDI-2415
> URL: https://issues.apache.org/jira/browse/HUDI-2415
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[hudi] branch master updated: [HUDI-2415] Add more info log for flink streaming reader (#3642)

2021-09-11 Thread danny0405

This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 9d5c3e5  [HUDI-2415]  Add more info log for flink streaming reader 
(#3642)
9d5c3e5 is described below

commit 9d5c3e5cb92a4247bb1fc9a4a0e2eb3d2fbce1d6
Author: Danny Chan 
AuthorDate: Sun Sep 12 10:00:17 2021 +0800

[HUDI-2415]  Add more info log for flink streaming reader (#3642)
---
 .../org/apache/hudi/source/StreamReadMonitoringFunction.java | 12 
 1 file changed, 12 insertions(+)

diff --git 
a/hudi-flink/src/main/java/org/apache/hudi/source/StreamReadMonitoringFunction.java
 
b/hudi-flink/src/main/java/org/apache/hudi/source/StreamReadMonitoringFunction.java
index ec56903..c5610d2 100644
--- 
a/hudi-flink/src/main/java/org/apache/hudi/source/StreamReadMonitoringFunction.java
+++ 
b/hudi-flink/src/main/java/org/apache/hudi/source/StreamReadMonitoringFunction.java
@@ -248,6 +248,13 @@ public class StreamReadMonitoringFunction
 List activeMetadataList = instants.stream()
 .map(instant -> WriteProfiles.getCommitMetadata(tableName, path, 
instant, commitTimeline)).collect(Collectors.toList());
 List archivedMetadataList = 
getArchivedMetadata(instantRange, commitTimeline, tableName);
+if (archivedMetadataList.size() > 0) {
+  LOG.warn(""
+  + 
"\n"
+  + "-- caution: the reader has fall behind too much from the 
writer,\n"
+  + "-- tweak 'read.tasks' option to add parallelism of read 
tasks.\n"
+  + 
"");
+}
 List metadataList = archivedMetadataList.size() > 0
 ? mergeList(activeMetadataList, archivedMetadataList)
 : activeMetadataList;
@@ -288,6 +295,11 @@ public class StreamReadMonitoringFunction
 }
 // update the issues instant time
 this.issuedInstant = commitToIssue;
+LOG.info(""
++ "\n"
++ "-- consumed to instant: {}\n"
++ "",
+commitToIssue);
   }
 
   @Override

[GitHub] [hudi] danny0405 merged pull request #3642: [HUDI-2415] Add more info log for flink streaming reader

2021-09-11 Thread GitBox



danny0405 merged pull request #3642:
URL: https://github.com/apache/hudi/pull/3642


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Resolved] (HUDI-2357) MERGE INTO doesn't work for tables created using CTAS

2021-09-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu resolved HUDI-2357.
--
Resolution: Fixed

> MERGE INTO doesn't work for tables created using CTAS
> -
>
> Key: HUDI-2357
> URL: https://issues.apache.org/jira/browse/HUDI-2357
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Vinoth Govindarajan
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> MERGE INTO command doesn't select the correct primary key for tables created 
> using CTAS, whereas it works for tables created using CREATE TABLE command.
> I guess we are hitting this issue because the key generator class is set to 
> SqlKeyGenerator for tables created using CTAS:
> working use-case:
> {code:java}
> create table h5 (id bigint, name string, ts bigint) using hudi
> options (type = "cow" , primaryKey="id" , preCombineField="ts" );
> merge into h5 as t0
> using (
> select 5 as s_id, 'vinoth' as s_name, current_timestamp() as s_ts
> ) t1
> on t1.s_id = t0.id
> when matched then update set * 
> when not matched then insert *;
> {code}
> hoodie.properties for working use-case:
> {code:java}
> ➜  analytics.db git:(apache_hudi_support) cat h5/.hoodie/hoodie.properties
> #Properties saved on Wed Aug 25 04:10:33 UTC 2021
> #Wed Aug 25 04:10:33 UTC 2021
> hoodie.table.name=h5
> hoodie.table.recordkey.fields=id
> hoodie.table.type=COPY_ON_WRITE
> hoodie.table.precombine.field=ts
> hoodie.table.partition.fields=
> hoodie.archivelog.folder=archived
> hoodie.table.create.schema={"type"\:"record","name"\:"topLevelRecord","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"id","type"\:["long","null"]},{"name"\:"name","type"\:["string","null"]},{"name"\:"ts","type"\:["long","null"]}]}
> hoodie.timeline.layout.version=1
> hoodie.table.version=1{code}
>  
> Whereas this doesn't work:
> {code:java}
> create table h4 using hudi options (type = "cow" , primaryKey="id" , 
> preCombineField="ts" ) as select 5 as id, cast(rand() as string) as name, 
> current_timestamp();
> merge into h3 as t0u sing (select '5' as s_id, 'vinoth' as s_name, 
> current_timestamp() as s_ts) t1 on t1.s_id = t0.id when matched then update 
> set * when not matched then insert *;
> ERROR LOG
> 544702 [main] ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver  - 
> Failed in [merge into analytics.h3 as t0using (    select '5' as s_id, 
> 'vinoth' as s_name, current_timestamp() as s_ts) t1on t1.s_id = t0.idwhen 
> matched then update set *when not matched then insert 
> *]java.lang.IllegalArgumentException: Merge Key[id] is not Equal to the 
> defined primary key[] in table h3 at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
>  at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:147)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at 
> org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618) at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
>  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616) at 
> org.apache.spark.sql.Dataset.(Dataset.scala:229) at 
> org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at 
> org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at 
> org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at 
> org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) at 
>

[jira] [Closed] (HUDI-2357) MERGE INTO doesn't work for tables created using CTAS

2021-09-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-2357.


> MERGE INTO doesn't work for tables created using CTAS
> -
>
> Key: HUDI-2357
> URL: https://issues.apache.org/jira/browse/HUDI-2357
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Vinoth Govindarajan
>Assignee: pengzhiwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> MERGE INTO command doesn't select the correct primary key for tables created 
> using CTAS, whereas it works for tables created using CREATE TABLE command.
> I guess we are hitting this issue because the key generator class is set to 
> SqlKeyGenerator for tables created using CTAS:
> working use-case:
> {code:java}
> create table h5 (id bigint, name string, ts bigint) using hudi
> options (type = "cow" , primaryKey="id" , preCombineField="ts" );
> merge into h5 as t0
> using (
> select 5 as s_id, 'vinoth' as s_name, current_timestamp() as s_ts
> ) t1
> on t1.s_id = t0.id
> when matched then update set * 
> when not matched then insert *;
> {code}
> hoodie.properties for working use-case:
> {code:java}
> ➜  analytics.db git:(apache_hudi_support) cat h5/.hoodie/hoodie.properties
> #Properties saved on Wed Aug 25 04:10:33 UTC 2021
> #Wed Aug 25 04:10:33 UTC 2021
> hoodie.table.name=h5
> hoodie.table.recordkey.fields=id
> hoodie.table.type=COPY_ON_WRITE
> hoodie.table.precombine.field=ts
> hoodie.table.partition.fields=
> hoodie.archivelog.folder=archived
> hoodie.table.create.schema={"type"\:"record","name"\:"topLevelRecord","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"id","type"\:["long","null"]},{"name"\:"name","type"\:["string","null"]},{"name"\:"ts","type"\:["long","null"]}]}
> hoodie.timeline.layout.version=1
> hoodie.table.version=1{code}
>  
> Whereas this doesn't work:
> {code:java}
> create table h4 using hudi options (type = "cow" , primaryKey="id" , 
> preCombineField="ts" ) as select 5 as id, cast(rand() as string) as name, 
> current_timestamp();
> merge into h3 as t0u sing (select '5' as s_id, 'vinoth' as s_name, 
> current_timestamp() as s_ts) t1 on t1.s_id = t0.id when matched then update 
> set * when not matched then insert *;
> ERROR LOG
> 544702 [main] ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver  - 
> Failed in [merge into analytics.h3 as t0using (    select '5' as s_id, 
> 'vinoth' as s_name, current_timestamp() as s_ts) t1on t1.s_id = t0.idwhen 
> matched then update set *when not matched then insert 
> *]java.lang.IllegalArgumentException: Merge Key[id] is not Equal to the 
> defined primary key[] in table h3 at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
>  at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:147)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229) at 
> org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618) at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
>  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616) at 
> org.apache.spark.sql.Dataset.(Dataset.scala:229) at 
> org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100) at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at 
> org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97) at 
> org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607) at 
> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at 
> org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602) at 
> org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)

[jira] [Commented] (HUDI-2387) Too many HEAD requests from Hudi to S3

2021-09-11 Thread Raymond Xu (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17413595#comment-17413595
 ] 

Raymond Xu commented on HUDI-2387:
--

[~uditme] would you raise this to AWS team please?

> Too many HEAD requests from Hudi to S3 
> ---
>
> Key: HUDI-2387
> URL: https://issues.apache.org/jira/browse/HUDI-2387
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core, Spark Integration
>Affects Versions: 0.8.0
> Environment: AWS Glue with PySpark
>Reporter: Sourav T
>Priority: Major
>
> We are using Apache Hudi from AWS Glue (with PySpark runtime) to store data 
> on S3 bucket. We are observing a very high number of S3 HEAD requests 
> originating from what we believe from Hudi. 
> Many a time due to this high number of requests, S3 throws "Status Code: 503; 
> Error Code: SlowDown" causing data losses. 
> Is there any any out-of-box feature to debug this further to confirm which 
> Hudi feature causing this? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (HUDI-1702) TestHoodieMergeOnReadTable.init fails randomly on Travis CI

2021-09-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-1702.


> TestHoodieMergeOnReadTable.init fails randomly on Travis CI
> ---
>
> Key: HUDI-1702
> URL: https://issues.apache.org/jira/browse/HUDI-1702
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Danny Chen
>Priority: Major
>  Labels: sev:triage
> Fix For: 0.10.0
>
>
> The test case fails randomly from time to time, which is annoying, take this 
> for a example:
> https://travis-ci.com/github/apache/hudi/jobs/491671521



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HUDI-1702) TestHoodieMergeOnReadTable.init fails randomly on Travis CI

2021-09-11 Thread Raymond Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu resolved HUDI-1702.
--
Fix Version/s: 0.10.0
   Resolution: Fixed

Fixed in https://issues.apache.org/jira/browse/HUDI-1989

> TestHoodieMergeOnReadTable.init fails randomly on Travis CI
> ---
>
> Key: HUDI-1702
> URL: https://issues.apache.org/jira/browse/HUDI-1702
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Reporter: Danny Chen
>Priority: Major
>  Labels: sev:triage
> Fix For: 0.10.0
>
>
> The test case fails randomly from time to time, which is annoying, take this 
> for a example:
> https://travis-ci.com/github/apache/hudi/jobs/491671521



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] fengjian428 closed pull request #3645: [HUDI-2413] fix Sql source's checkpoint

2021-09-11 Thread GitBox



fengjian428 closed pull request #3645:
URL: https://github.com/apache/hudi/pull/3645


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3645: [HUDI-2413] fix Sql source's checkpoint

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #3645:
URL: https://github.com/apache/hudi/pull/3645#issuecomment-917441426


   
   ## CI report:
   
   * be4aeaec24d12c0af19d8497dafcb6c60de0dfba Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2164)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-2418) add HiveSchemaProvider

2021-09-11 Thread Jian Feng (Jira)

Jian Feng created HUDI-2418:
---

 Summary: add HiveSchemaProvider 
 Key: HUDI-2418
 URL: https://issues.apache.org/jira/browse/HUDI-2418
 Project: Apache Hudi
  Issue Type: Improvement
  Components: DeltaStreamer
Reporter: Jian Feng


when using DeltaStreamer to migrate exist Hive table, it better to have a 
HiveSchemaProvider instead of avro schema file.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot commented on pull request #3645: [HUDI-2413] fix Sql source's checkpoint

2021-09-11 Thread GitBox



hudi-bot commented on pull request #3645:
URL: https://github.com/apache/hudi/pull/3645#issuecomment-917441426


   
   ## CI report:
   
   * be4aeaec24d12c0af19d8497dafcb6c60de0dfba UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2413) Sql source in delta streamer does not work

2021-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2413:
-
Labels: pull-request-available  (was: )

> Sql source in delta streamer does not work
> --
>
> Key: HUDI-2413
> URL: https://issues.apache.org/jira/browse/HUDI-2413
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Jian Feng
>Assignee: Jian Feng
>Priority: Major
>  Labels: pull-request-available
>
> sql source return null checkpoint,  in DeltaSync null checkpoint will be 
> judged as no new data，should return a empty string



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] fengjian428 opened a new pull request #3645: [HUDI-2413] fix Sql source's checkpoint

2021-09-11 Thread GitBox



fengjian428 opened a new pull request #3645:
URL: https://github.com/apache/hudi/pull/3645


   https://issues.apache.org/jira/browse/HUDI-2413
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3571: Caused by: java.lang.NoSuchFieldError: NULL_VALUE[SUPPORT]

2021-09-11 Thread GitBox



nsivabalan commented on issue #3571:
URL: https://github.com/apache/hudi/issues/3571#issuecomment-917433769


   May I know whats the spark bundle you are using ? Hudi has 3 diff bundles 
for spark and scala version variants


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3582: [SUPPORT] Upsert to hudi table fails that got bootstrapped (w/ metadata only)

2021-09-11 Thread GitBox



nsivabalan commented on issue #3582:
URL: https://github.com/apache/hudi/issues/3582#issuecomment-917433235


   @yanghua : Can you please help here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3554: [SUPPORT] Support Apache Spark 3.1

2021-09-11 Thread GitBox



nsivabalan commented on issue #3554:
URL: https://github.com/apache/hudi/issues/3554#issuecomment-917432982


   @pengzhiwei2018 : feel free to close out this issue if we have a tracking 
jira. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3533: [SUPPORT]How to use MOR Table to Merge small file?

2021-09-11 Thread GitBox



nsivabalan commented on issue #3533:
URL: https://github.com/apache/hudi/issues/3533#issuecomment-917432581


   yes, can you list files along w/ sizes. Based on the logs you have provided, 
likely we are interested in files w/ commit time
   20210826162904 and 20210826162935. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #3644:
URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782


   
   ## CI report:
   
   * 7ba55821ed9caedaaafa68afe9471d074d1a4cba Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2163)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3555: [SUPPORT] support show/drop partitions tablename sql

2021-09-11 Thread GitBox



nsivabalan commented on issue #3555:
URL: https://github.com/apache/hudi/issues/3555#issuecomment-917431583


   @pengzhiwei2018 : Can you take this up. If you have a tracking jira, please 
link here and close this issue out. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3431: [SUPPORT] Failed to upsert for commit time

2021-09-11 Thread GitBox



nsivabalan commented on issue #3431:
URL: https://github.com/apache/hudi/issues/3431#issuecomment-917431404


   awesome, thnx for the update. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3418: [SUPPORT] Hudi Upsert Very Slow/ Failed With No Space Left on Device

2021-09-11 Thread GitBox



nsivabalan commented on issue #3418:
URL: https://github.com/apache/hudi/issues/3418#issuecomment-917431256


   If you wish to dedup with bulk_insert, we also need to set 
"hoodie.combine.before.insert" to true. 
   Just to clarify, bulk_insert will not looking into any records in storage at 
all. so setting this config, will ensure incoming batch is deduped and written 
to hudi. 
   In other words, if you do 2 bulk_inserts, one followed by another, each 
batch will write unique records to hudi, but if there are records overlapping 
between batch 1 and batch2, bulk_insert may not update it. 
   
   hope that clarifies. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.

2021-09-11 Thread GitBox



nsivabalan commented on issue #3605:
URL: https://github.com/apache/hudi/issues/3605#issuecomment-917429823


   Hey hi @Ambarish-Giri : 
   For initial bulk loading of data into hudi, you can try "bulk_insert" 
operation. it is expected to be faster compared to regular operations. Ensure 
you set the right value for [avg record size 
config](https://hudi.apache.org/docs/configurations/#hoodiecopyonwriterecordsizeestimate)
 . for subsequent operations, hudi will infer the record size from older 
commits. But for first commit (bulk import/bulk_insert), hudi relies on this 
config to pack records to right sized files. 
   
   Couple of questions before we dive into perf in detail: 
   1. may I know whats your upsert characteristics? Is it spread across all 
partitions, or just very few recent partitions. 
   2. Does your record key have any timestamp affinity or characteristics. If 
record keys are completely random, we can try SIMPLE index, since bloom may not 
be very effective for completely random keys. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #3644:
URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782


   
   ## CI report:
   
   * 0e90c981ab046c7f8dfd88df25d5f3d16bb7552c Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2162)
 
   * 7ba55821ed9caedaaafa68afe9471d074d1a4cba Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2163)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #3644:
URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782


   
   ## CI report:
   
   * 0e90c981ab046c7f8dfd88df25d5f3d16bb7552c Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2162)
 
   * 7ba55821ed9caedaaafa68afe9471d074d1a4cba UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #3644:
URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782


   
   ## CI report:
   
   * 0e90c981ab046c7f8dfd88df25d5f3d16bb7552c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2162)
 
   * 7ba55821ed9caedaaafa68afe9471d074d1a4cba UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #3644:
URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782


   
   ## CI report:
   
   * 0e90c981ab046c7f8dfd88df25d5f3d16bb7552c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2162)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] SteNicholas commented on a change in pull request #3633: [HUDI-2410] Fix getDefaultBootstrapIndexClass logical error

2021-09-11 Thread GitBox



SteNicholas commented on a change in pull request #3633:
URL: https://github.com/apache/hudi/pull/3633#discussion_r706619802



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -136,10 +136,10 @@
   .defaultValue("archived")
   .withDocumentation("path under the meta folder, to store archived 
timeline instants at.");
 
-  public static final ConfigProperty BOOTSTRAP_INDEX_ENABLE = 
ConfigProperty
+  public static final ConfigProperty BOOTSTRAP_INDEX_ENABLE = 
ConfigProperty
   .key("hoodie.bootstrap.index.enable")
-  .noDefaultValue()
-  .withDocumentation("Whether or not, this is a bootstrapped table, with 
bootstrap base data and an mapping index defined.");
+  .defaultValue(false)

Review comment:
   IMO, the default value of `hoodie.bootstrap.index.enable` should be true.

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##
@@ -298,8 +298,9 @@ public String getBootstrapIndexClass() {
   }
 
   public static String getDefaultBootstrapIndexClass(Properties props) {
+HoodieConfig hoodieConfig = new HoodieConfig(props);
 String defaultClass = BOOTSTRAP_INDEX_CLASS_NAME.defaultValue();
-if 
("false".equalsIgnoreCase(props.getProperty(BOOTSTRAP_INDEX_ENABLE.key( {
+if (!hoodieConfig.getBooleanOrDefault(BOOTSTRAP_INDEX_ENABLE)) {

Review comment:
   The option `hoodie.bootstrap.index.class` could not have the default 
value. If the default value of the `hoodie.bootstrap.index.class` is 
`HFileBootstrapIndex.class.getName()`, the method 
`getDefaultBootstrapIndexClass` should be renamed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient

2021-09-11 Thread GitBox



hudi-bot commented on pull request #3644:
URL: https://github.com/apache/hudi/pull/3644#issuecomment-917416782


   
   ## CI report:
   
   * 0e90c981ab046c7f8dfd88df25d5f3d16bb7552c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-2417) Add support allowDuplicateInserts in HoodieJavaClient

2021-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2417:
-
Labels: pull-request-available  (was: )

> Add support allowDuplicateInserts in HoodieJavaClient 
> --
>
> Key: HUDI-2417
> URL: https://issues.apache.org/jira/browse/HUDI-2417
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Add support allowDuplicateInserts in HoodieJavaClient



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] dongkelun opened a new pull request #3644: [HUDI-2417] Add support allowDuplicateInserts in HoodieJavaClient

2021-09-11 Thread GitBox



dongkelun opened a new pull request #3644:
URL: https://github.com/apache/hudi/pull/3644


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *Add support allowDuplicateInserts in HoodieJavaClient*
   
   ## Brief change log
   
   *(for example:)*
 - *Add support allowDuplicateInserts in HoodieJavaClient*
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   
 - *Added testHoodieConcatHandle in TestJavaCopyOnWriteActionExecutor*
   
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-2417) Add support allowDuplicateInserts in HoodieJavaClient

2021-09-11 Thread Jira

董可伦 created HUDI-2417:
-

 Summary: Add support allowDuplicateInserts in HoodieJavaClient 
 Key: HUDI-2417
 URL: https://issues.apache.org/jira/browse/HUDI-2417
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Writer Core
Reporter: 董可伦
Assignee: 董可伦
 Fix For: 0.10.0


Add support allowDuplicateInserts in HoodieJavaClient



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2416) Move FAQs to website

2021-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2416:
-
Labels: pull-request-available  (was: )

> Move FAQs to website
> 
>
> Key: HUDI-2416
> URL: https://issues.apache.org/jira/browse/HUDI-2416
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs, Usability
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available
>
> We intend to move all the docs from cWiki to website. FAQs is a good starting 
> point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #3496: [HUDI-2416] Move content from cwiki to website (FAQ movement)

2021-09-11 Thread GitBox



pratyakshsharma commented on a change in pull request #3496:
URL: https://github.com/apache/hudi/pull/3496#discussion_r706600974



##
File path: website/learn/faq.md
##
@@ -0,0 +1,440 @@
+---
+title: FAQs
+keywords: [hudi, writing, reading]
+last_modified_at: 2021-08-18T15:59:57-04:00
+---
+# FAQs
+
+## General
+
+### When is Hudi useful for me or my organization?
+   
+If you are looking to quickly ingest data onto HDFS or cloud storage, Hudi can 
provide you tools to [help](https://hudi.apache.org/docs/writing_data/). Also, 
if you have ETL/hive/spark jobs which are slow/taking up a lot of resources, 
Hudi can potentially help by providing an incremental approach to reading and 
writing data.
+
+As an organization, Hudi can help you build an [efficient data 
lake](https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM/edit#slide=id.p),
 solving some of the most complex, low-level storage management problems, while 
putting data into hands of your data analysts, engineers and scientists much 
quicker.
+
+### What are some non-goals for Hudi?
+
+Hudi is not designed for any OLTP use-cases, where typically you are using 
existing NoSQL/RDBMS data stores. Hudi cannot replace your in-memory analytical 
database (at-least not yet!). Hudi support near-real time ingestion in the 
order of few minutes, trading off latency for efficient batching. If you truly 
desirable sub-minute processing delays, then stick with your favorite stream 
processing solution. 
+
+### What is incremental processing? Why does Hudi docs/talks keep talking 
about it?
+
+Incremental processing was first introduced by Vinoth Chandar, in the O'reilly 
[blog](https://www.oreilly.com/content/ubers-case-for-incremental-processing-on-hadoop/),
 that set off most of this effort. In purely technical terms, incremental 
processing merely refers to writing mini-batch programs in streaming processing 
style. Typical batch jobs consume **all input** and recompute **all output**, 
every few hours. Typical stream processing jobs consume some **new input** and 
recompute **new/changes to output**, continuously/every few seconds. While 
recomputing all output in batch fashion can be simpler, it's wasteful and 
resource expensive. Hudi brings ability to author the same batch pipelines in 
streaming fashion, run every few minutes.
+
+While we can merely refer to this as stream processing, we call it 
*incremental processing*, to distinguish from purely stream processing 
pipelines built using Apache Flink, Apache Apex or Apache Kafka Streams.
+
+### What is the difference between copy-on-write (COW) vs merge-on-read (MOR) 
storage types?
+
+**Copy On Write** - This storage type enables clients to ingest data on 
columnar file formats, currently parquet. Any new data that is written to the 
Hudi dataset using COW storage type, will write new parquet files. Updating an 
existing set of rows will result in a rewrite of the entire parquet files that 
collectively contain the affected rows being updated. Hence, all writes to such 
datasets are limited by parquet writing performance, the larger the parquet 
file, the higher is the time taken to ingest the data.
+
+**Merge On Read** - This storage type enables clients to  ingest data quickly 
onto row based data format such as avro. Any new data that is written to the 
Hudi dataset using MOR table type, will write new log/delta files that 
internally store the data as avro encoded bytes. A compaction process 
(configured as inline or asynchronous) will convert log file format to columnar 
file format (parquet). Two different InputFormats expose 2 different views of 
this data, Read Optimized view exposes columnar parquet reading performance 
while Realtime View exposes columnar and/or log reading performance 
respectively. Updating an existing set of rows will result in either a) a 
companion log/delta file for an existing base parquet file generated from a 
previous compaction or b) an update written to a log/delta file in case no 
compaction ever happened for it. Hence, all writes to such datasets are limited 
by avro/log file writing performance, much faster than parquet. Although, there 
is a higher co
 st to pay to read log/delta files vs columnar (parquet) files.
+
+More details can be found [here](https://hudi.apache.org/docs/concepts/) and 
also [Design And 
Architecture](https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture).
+
+### How do I choose a storage type for my workload?
+
+A key goal of Hudi is to provide **upsert functionality** that is orders of 
magnitude faster than rewriting entire tables or partitions.
+
+Choose Copy-on-write storage if :
+
+ - You are looking for a simple alternative, that replaces your existing 
parquet tables without any need for real-time data.
+ - Your current job is rewriting entire table/partition to deal with updates, 
while only a few files actually change in each partition.
+ - You are happy

[jira] [Updated] (HUDI-2416) Move FAQs to website

2021-09-11 Thread Pratyaksh Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-2416:
---
Status: Patch Available  (was: In Progress)

> Move FAQs to website
> 
>
> Key: HUDI-2416
> URL: https://issues.apache.org/jira/browse/HUDI-2416
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs, Usability
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> We intend to move all the docs from cWiki to website. FAQs is a good starting 
> point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2416) Move FAQs to website

2021-09-11 Thread Pratyaksh Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyaksh Sharma updated HUDI-2416:
---
Status: In Progress  (was: Open)

> Move FAQs to website
> 
>
> Key: HUDI-2416
> URL: https://issues.apache.org/jira/browse/HUDI-2416
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Docs, Usability
>Reporter: Pratyaksh Sharma
>Assignee: Pratyaksh Sharma
>Priority: Major
>
> We intend to move all the docs from cWiki to website. FAQs is a good starting 
> point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-2416) Move FAQs to website

2021-09-11 Thread Pratyaksh Sharma (Jira)

Pratyaksh Sharma created HUDI-2416:
--

 Summary: Move FAQs to website
 Key: HUDI-2416
 URL: https://issues.apache.org/jira/browse/HUDI-2416
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Docs, Usability
Reporter: Pratyaksh Sharma
Assignee: Pratyaksh Sharma


We intend to move all the docs from cWiki to website. FAQs is a good starting 
point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] pratyakshsharma commented on pull request #3496: Move content from cwiki to website (FAQ movement)

2021-09-11 Thread GitBox



pratyakshsharma commented on pull request #3496:
URL: https://github.com/apache/hudi/pull/3496#issuecomment-917389381


   @vinothchandar Please take a look. Fixed all the broken links now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] pratyakshsharma commented on pull request #3416: [HUDI-2362] Add external config file support

2021-09-11 Thread GitBox



pratyakshsharma commented on pull request #3416:
URL: https://github.com/apache/hudi/pull/3416#issuecomment-917378727


   Got it, please resolve the conflicts and we can start review then. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #3608: [HUDI-2397] Add `--enable-sync` parameter

2021-09-11 Thread GitBox



pratyakshsharma commented on a change in pull request #3608:
URL: https://github.com/apache/hudi/pull/3608#discussion_r706587965



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java
##
@@ -187,7 +188,7 @@ public void testMultiTableExecutionWithParquetSource() 
throws IOException {
 // add only common props. later we can add per table props
 String parquetPropsFile = populateCommonPropsAndWriteToFile();
 
-HoodieMultiTableDeltaStreamer.Config cfg = 
TestHelpers.getConfig(parquetPropsFile, dfsBasePath + "/config", 
ParquetDFSSource.class.getName(), false,
+HoodieMultiTableDeltaStreamer.Config cfg = 
TestHelpers.getConfig(parquetPropsFile, dfsBasePath + "/config", 
ParquetDFSSource.class.getName(), false, true,

Review comment:
   ditto

##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java
##
@@ -218,7 +219,7 @@ public void testMultiTableExecutionWithParquetSource() 
throws IOException {
 
   @Test
   public void testTableLevelProperties() throws IOException {
-HoodieMultiTableDeltaStreamer.Config cfg = 
TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", 
TestDataSource.class.getName(), false);
+HoodieMultiTableDeltaStreamer.Config cfg = 
TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", 
TestDataSource.class.getName(), false, true);

Review comment:
   ditto

##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieMultiTableDeltaStreamer.java
##
@@ -138,7 +139,7 @@ public void testMultiTableExecutionWithKafkaSource() throws 
IOException {
 testUtils.sendMessages(topicName1, 
Helpers.jsonifyRecords(dataGenerator.generateInsertsAsPerSchema("000", 5, 
HoodieTestDataGenerator.TRIP_SCHEMA)));
 testUtils.sendMessages(topicName2, 
Helpers.jsonifyRecords(dataGenerator.generateInsertsAsPerSchema("000", 10, 
HoodieTestDataGenerator.SHORT_TRIP_SCHEMA)));
 
-HoodieMultiTableDeltaStreamer.Config cfg = 
TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", 
JsonKafkaSource.class.getName(), false);
+HoodieMultiTableDeltaStreamer.Config cfg = 
TestHelpers.getConfig(PROPS_FILENAME_TEST_SOURCE1, dfsBasePath + "/config", 
JsonKafkaSource.class.getName(), false, true);

Review comment:
   Let us keep enableMetaSync as false where enableHiveSync is also false? 
Otherwise it might lead to confusion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] Ambarish-Giri edited a comment on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.

2021-09-11 Thread GitBox



Ambarish-Giri edited a comment on issue #3605:
URL: https://github.com/apache/hudi/issues/3605#issuecomment-917361121


   Hi @danny0405 can you explain a bit more on "if the BloomFilter got false 
positive"?  
   In my case the record key is concat(uuid4,segmentId). SegmentId is an 
integer value i.e. it can be same for multiple records and uuid4 is standard 
unique random value ( note: "-" are being removed from the uuid4 values 
though), but a combination of both identifies a record uniquely and partition 
key is again segmentId  as it has low cardinality.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-2413) Sql source in delta streamer does not work

2021-09-11 Thread leesf (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf reassigned HUDI-2413:
---

Assignee: Jian Feng

> Sql source in delta streamer does not work
> --
>
> Key: HUDI-2413
> URL: https://issues.apache.org/jira/browse/HUDI-2413
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer
>Reporter: Jian Feng
>Assignee: Jian Feng
>Priority: Major
>
> sql source return null checkpoint,  in DeltaSync null checkpoint will be 
> judged as no new data，should return a empty string



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] leesf merged pull request #3643: [MINOR] Fix typo, 'requried' corrected to 'required'

2021-09-11 Thread GitBox



leesf merged pull request #3643:
URL: https://github.com/apache/hudi/pull/3643


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated: [MINOR] Fix typo, 'requried' corrected to 'required' (#3643)

2021-09-11 Thread leesf

This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 6228b17  [MINOR] Fix typo, 'requried' corrected to 'required' (#3643)
6228b17 is described below

commit 6228b17a3ddb4c336b30e5b8c650e003e38b5e3e
Author: 董可伦 
AuthorDate: Sat Sep 11 15:46:24 2021 +0800

[MINOR] Fix typo, 'requried' corrected to 'required' (#3643)
---
 .../src/main/java/org/apache/hudi/io/HoodieMergeHandle.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
index 3e20141..b01d62f 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
@@ -77,7 +77,7 @@ import java.util.Set;
  * Existing data:
  * rec1_1, rec2_1, rec3_1, rec4_1
  *
- * For every existing record, merge w/ incoming if requried and write to 
storage.
+ * For every existing record, merge w/ incoming if required and write to 
storage.
  *=> rec1_1 and rec1_2 is merged to write rec1_2 to storage
  *=> rec2_1 is written as is
  *=> rec3_1 is written as is

[hudi] branch master updated: [MINOR] fix typo (#3640)

2021-09-11 Thread leesf

This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new dbcf60f  [MINOR] fix typo (#3640)
dbcf60f is described below

commit dbcf60f370e93ab490cf82e677387a07ea743cda
Author: 董可伦 
AuthorDate: Sat Sep 11 15:45:49 2021 +0800

[MINOR] fix typo (#3640)
---
 .../org/apache/hudi/table/action/commit/JavaUpsertPartitioner.java  | 6 +++---
 .../java/org/apache/hudi/table/action/commit/UpsertPartitioner.java | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git 
a/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/JavaUpsertPartitioner.java
 
b/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/JavaUpsertPartitioner.java
index 6b5cb29..33f59f4 100644
--- 
a/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/JavaUpsertPartitioner.java
+++ 
b/hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/JavaUpsertPartitioner.java
@@ -189,13 +189,13 @@ public class JavaUpsertPartitioner> implements
 
 // Go over all such buckets, and assign weights as per amount of 
incoming inserts.
 List insertBuckets = new 
ArrayList<>();
-double curentCumulativeWeight = 0;
+double currentCumulativeWeight = 0;
 for (int i = 0; i < bucketNumbers.size(); i++) {
   InsertBucket bkt = new InsertBucket();
   bkt.bucketNumber = bucketNumbers.get(i);
   bkt.weight = (1.0 * recordsPerBucket.get(i)) / pStat.getNumInserts();
-  curentCumulativeWeight += bkt.weight;
-  insertBuckets.add(new InsertBucketCumulativeWeightPair(bkt, 
curentCumulativeWeight));
+  currentCumulativeWeight += bkt.weight;
+  insertBuckets.add(new InsertBucketCumulativeWeightPair(bkt, 
currentCumulativeWeight));
 }
 LOG.info("Total insert buckets for partition path " + partitionPath + 
" => " + insertBuckets);
 partitionPathToInsertBucketInfos.put(partitionPath, insertBuckets);
diff --git 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
index 3c0a511..35a8bdd 100644
--- 
a/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
+++ 
b/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
@@ -232,13 +232,13 @@ public class UpsertPartitioner> extends Partiti
 
 // Go over all such buckets, and assign weights as per amount of 
incoming inserts.
 List insertBuckets = new 
ArrayList<>();
-double curentCumulativeWeight = 0;
+double currentCumulativeWeight = 0;
 for (int i = 0; i < bucketNumbers.size(); i++) {
   InsertBucket bkt = new InsertBucket();
   bkt.bucketNumber = bucketNumbers.get(i);
   bkt.weight = (1.0 * recordsPerBucket.get(i)) / pStat.getNumInserts();
-  curentCumulativeWeight += bkt.weight;
-  insertBuckets.add(new InsertBucketCumulativeWeightPair(bkt, 
curentCumulativeWeight));
+  currentCumulativeWeight += bkt.weight;
+  insertBuckets.add(new InsertBucketCumulativeWeightPair(bkt, 
currentCumulativeWeight));
 }
 LOG.info("Total insert buckets for partition path " + partitionPath + 
" => " + insertBuckets);
 partitionPathToInsertBucketInfos.put(partitionPath, insertBuckets);

[GitHub] [hudi] leesf merged pull request #3640: [MINOR] Fix typo

2021-09-11 Thread GitBox



leesf merged pull request #3640:
URL: https://github.com/apache/hudi/pull/3640


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] Ambarish-Giri commented on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.

2021-09-11 Thread GitBox



Ambarish-Giri commented on issue #3605:
URL: https://github.com/apache/hudi/issues/3605#issuecomment-917361121


   Hi @danny0405 can you explain a bit more on "if the BloomFilter got false 
positive"?  
   In my case the record key is concat(uuid4,segmentId). SegmentId is an 
integer value i.e. it can be same for multiple records and uuid4 is standard 
unique random value, but a combination of both identifies a record uniquely and 
partition key is again segmentId  as it has low cardinality.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3643: [MINOR] Fix typo, 'requried' corrected to 'required'

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #3643:
URL: https://github.com/apache/hudi/pull/3643#issuecomment-917351059


   
   ## CI report:
   
   * 2151667bdd2cc7fafd47462a5f7e13726b4edbf9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2160)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3642: [HUDI-2415] Add more info log for flink streaming reader

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #3642:
URL: https://github.com/apache/hudi/pull/3642#issuecomment-917342840


   
   ## CI report:
   
   * e6d28ea164871213cc28b922160bad95d13a94de Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2159)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] sbernauer closed pull request #1844: [WIP] Added test to reproduce a problem with schema evolution

2021-09-11 Thread GitBox



sbernauer closed pull request #1844:
URL: https://github.com/apache/hudi/pull/1844


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] sbernauer commented on pull request #1844: [WIP] Added test to reproduce a problem with schema evolution

2021-09-11 Thread GitBox



sbernauer commented on pull request #1844:
URL: https://github.com/apache/hudi/pull/1844#issuecomment-917354441


   Thanks @codope for your work!
   The most relevant part of the tests got included with #2927
   Thanks for adding the other part!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-2414) enable Hot and cold data separate when ingest data

2021-09-11 Thread Jian Feng (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Feng reassigned HUDI-2414:
---

Assignee: Jian Feng

> enable Hot and cold data separate when ingest data
> --
>
> Key: HUDI-2414
> URL: https://issues.apache.org/jira/browse/HUDI-2414
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Jian Feng
>Assignee: Jian Feng
>Priority: Major
>
> when using Hudi to ingest e-commercial company's item data，there are massive 
> update data into old partitions，if one record need update， then the whole 
> file it belongs need rewrite, that result in every commit nearly rewrite the 
> whole table.
> I'm thinking if Hudi can provide a hot and cold data separate tool, work with 
> specific column(such as create time and update time) to distinguish hot data 
> and cold data, then rebuild table to separate them into different file 
> groups, after recreate table， the performance will be much better 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [hudi] hudi-bot edited a comment on pull request #3643: [MINOR] Fix typo, 'requried' corrected to 'required'

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #3643:
URL: https://github.com/apache/hudi/pull/3643#issuecomment-917351059


   
   ## CI report:
   
   * 2151667bdd2cc7fafd47462a5f7e13726b4edbf9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2160)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #3643: [MINOR] Fix typo, 'requried' corrected to 'required'

2021-09-11 Thread GitBox



hudi-bot commented on pull request #3643:
URL: https://github.com/apache/hudi/pull/3643#issuecomment-917351059


   
   ## CI report:
   
   * 2151667bdd2cc7fafd47462a5f7e13726b4edbf9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] dongkelun opened a new pull request #3643: [MINOR] Fix typo, 'requried' corrected to 'required'

2021-09-11 Thread GitBox



dongkelun opened a new pull request #3643:
URL: https://github.com/apache/hudi/pull/3643


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *Fix typo, 'requried' corrected to 'required'*
   
   ## Brief change log
   
   *(for example:)*
 - *Fix typo, 'requried' corrected to 'required'*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot edited a comment on pull request #3642: [HUDI-2415] Add more info log for flink streaming reader

2021-09-11 Thread GitBox



hudi-bot edited a comment on pull request #3642:
URL: https://github.com/apache/hudi/pull/3642#issuecomment-917342840


   
   ## CI report:
   
   * f16a50a6de42940ea2794b06ef079472dd480875 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2157)
 
   * e6d28ea164871213cc28b922160bad95d13a94de Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2159)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

71 matches

Mail list logo