codejoyan commented on issue #2620:
URL: https://github.com/apache/hudi/issues/2620#issuecomment-886077052
Some additional details for the above runs.
1. The configs I am using - REGULAR BLOOM.
2. Max and Min file size in older partitions - 116 MB and 6 MB respectively
3. Avg
codejoyan commented on issue #2620:
URL: https://github.com/apache/hudi/issues/2620#issuecomment-884898614
**Problem Statement:** I am using COW table and receiving roughly 1GB of
incremental data. The batch has data quality check and upsert. Attached is the
spark UI stages screenshot:
codejoyan commented on issue #2620:
URL: https://github.com/apache/hudi/issues/2620#issuecomment-816012609
@nsivabalan, any inputs would be very helpful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
codejoyan commented on issue #2620:
URL: https://github.com/apache/hudi/issues/2620#issuecomment-800542037
Aplogies for the delay @nsivabalan
Below are the answers to the questions you asked:
- What constitutes your record key? - The record key is random within a
partition (store
codejoyan commented on issue #2620:
URL: https://github.com/apache/hudi/issues/2620#issuecomment-791961553
Thanks @bvaradar and @nsivabalan. Please let me know how to improve the
performance.
I used the below configurations (SIMPLE INDEX and turned off compaction) to
speed up the