[GitHub] [hudi] codejoyan commented on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-07-24 Thread GitBox
codejoyan commented on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-886077052 Some additional details for the above runs. 1. The configs I am using - REGULAR BLOOM. 2. Max and Min file size in older partitions - 116 MB and 6 MB respectively 3. Avg

[GitHub] [hudi] codejoyan commented on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-07-22 Thread GitBox
codejoyan commented on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-884898614 **Problem Statement:** I am using COW table and receiving roughly 1GB of incremental data. The batch has data quality check and upsert. Attached is the spark UI stages screenshot:

[GitHub] [hudi] codejoyan commented on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-04-08 Thread GitBox
codejoyan commented on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-816012609 @nsivabalan, any inputs would be very helpful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [hudi] codejoyan commented on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-03-16 Thread GitBox
codejoyan commented on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-800542037 Aplogies for the delay @nsivabalan Below are the answers to the questions you asked: - What constitutes your record key? - The record key is random within a partition (store

[GitHub] [hudi] codejoyan commented on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-03-06 Thread GitBox
codejoyan commented on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-791961553 Thanks @bvaradar and @nsivabalan. Please let me know how to improve the performance. I used the below configurations (SIMPLE INDEX and turned off compaction) to speed up the