Re: [I] [SUPPORT] The INSERT records are marked as UPDATE [hudi]

2023-11-27 Thread via GitHub
danny0405 commented on issue #10156: URL: https://github.com/apache/hudi/issues/10156#issuecomment-1829129236 yes, the mor reader merges the payloads before returning the result set. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] [SUPPORT] The INSERT records are marked as UPDATE [hudi]

2023-11-27 Thread via GitHub
zdl1 commented on issue #10156: URL: https://github.com/apache/hudi/issues/10156#issuecomment-1828955749 > the compaction writer actually knows the record operation when it does the payload merging Sorry for the late reply, what if I just `select count(*) from table`? Does it

Re: [I] [SUPPORT] The INSERT records are marked as UPDATE [hudi]

2023-11-23 Thread via GitHub
danny0405 commented on issue #10156: URL: https://github.com/apache/hudi/issues/10156#issuecomment-1823978838 yeah, not very good way to collect the correct numbers, a good chance if for compaction, the compaction writer actually knows the record operation when it does the payload merging.

Re: [I] [SUPPORT] The INSERT records are marked as UPDATE [hudi]

2023-11-22 Thread via GitHub
zdl1 commented on issue #10156: URL: https://github.com/apache/hudi/issues/10156#issuecomment-1823873879 > there is no way to figure out whether a key has been written to an existing bucket before, except the first file slice, so all the records are updates. Thanks for the

Re: [I] [SUPPORT] The INSERT records are marked as UPDATE [hudi]

2023-11-22 Thread via GitHub
danny0405 commented on issue #10156: URL: https://github.com/apache/hudi/issues/10156#issuecomment-1822589574 Yeah, correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] [SUPPORT] The INSERT records are marked as UPDATE [hudi]

2023-11-22 Thread via GitHub
zdl1 commented on issue #10156: URL: https://github.com/apache/hudi/issues/10156#issuecomment-1822541106 There are 2 delta_commit files generated by 2 insertions ![image](https://github.com/apache/hudi/assets/149354640/72916947-7efa-44f1-a858-78e4db289d90)

Re: [I] [SUPPORT] The INSERT records are marked as UPDATE [hudi]

2023-11-22 Thread via GitHub
danny0405 commented on issue #10156: URL: https://github.com/apache/hudi/issues/10156#issuecomment-1822537964 yes, you are right, actually in bucket index, there is no way to figure out whether a key has been written to an existing bucket before, except the first file slice, so all the

Re: [I] [SUPPORT] The INSERT records are marked as UPDATE [hudi]

2023-11-22 Thread via GitHub
zdl1 commented on issue #10156: URL: https://github.com/apache/hudi/issues/10156#issuecomment-1822501418 > The bucket type `UPSERT` and record operation(update/delete) are different things, an `upsert` buckets just means we write more inc records to existing data bucket. Thanks for

Re: [I] [SUPPORT] The INSERT records are marked as UPDATE [hudi]

2023-11-22 Thread via GitHub
danny0405 commented on issue #10156: URL: https://github.com/apache/hudi/issues/10156#issuecomment-1822468337 The bucket type `UPSERT` and record operation(update/delete) are different things, an `upsert` buckets just means we write more inc records to existing data bucket. -- This is

[I] [SUPPORT] The INSERT records are marked as UPDATE [hudi]

2023-11-22 Thread via GitHub
zdl1 opened a new issue, #10156: URL: https://github.com/apache/hudi/issues/10156 When inserting a record to a MOR table, trying to assign a bucket for the record ![image](https://github.com/apache/hudi/assets/149354640/a50c960f-79ee-4071-9057-8cbe526a45e2)