[GitHub] [hudi] codejoyan edited a comment on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-07-25 Thread GitBox
codejoyan edited a comment on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-886077052 Some additional details for the above runs. 1. The configs I am using - REGULAR BLOOM. 2. Max and Min file size in older partitions - 116 MB and 6 MB respectively 3. Av

[GitHub] [hudi] codejoyan edited a comment on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-07-22 Thread GitBox
codejoyan edited a comment on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-884898614 **Problem Statement:** I am using COW table and receiving roughly 1GB of incremental data. The batch has data quality check and upsert. Attached is the spark UI stages screensh

[GitHub] [hudi] codejoyan edited a comment on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-07-22 Thread GitBox
codejoyan edited a comment on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-884898614 **Problem Statement:** I am using COW table and receiving roughly 1GB of incremental data. The batch has data quality check and upsert. Attached is the spark UI stages screensh

[GitHub] [hudi] codejoyan edited a comment on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-07-22 Thread GitBox
codejoyan edited a comment on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-884898614 **Problem Statement:** I am using COW table and receiving roughly 1GB of incremental data. The batch has data quality check and upsert. Attached is the spark UI stages screensh

[GitHub] [hudi] codejoyan edited a comment on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-07-22 Thread GitBox
codejoyan edited a comment on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-884898614 **Problem Statement:** I am using COW table and receiving roughly 1GB of incremental data. The batch has data quality check and upsert. Attached is the spark UI stages screensh

[GitHub] [hudi] codejoyan edited a comment on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-03-16 Thread GitBox
codejoyan edited a comment on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-800542037 Apologies for the delay @nsivabalan Below are the answers to the questions you asked: - What constitutes your record key? - _The record key is random within a partition (

[GitHub] [hudi] codejoyan edited a comment on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-03-06 Thread GitBox
codejoyan edited a comment on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-791961553 Thanks @bvaradar and @nsivabalan. Please let me know how to improve the performance or if you need any further details to investigate. I used the below configurations (SIMPLE

[GitHub] [hudi] codejoyan edited a comment on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-03-06 Thread GitBox
codejoyan edited a comment on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-791961553 Thanks @bvaradar and @nsivabalan. Please let me know how to improve the performance or if you need any further details to investigate. I used the below configurations (SIMPLE

[GitHub] [hudi] codejoyan edited a comment on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-03-06 Thread GitBox
codejoyan edited a comment on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-791961553 Thanks @bvaradar and @nsivabalan. Please let me know how to improve the performance or if you need any further details to investigate. I used the below configurations (SIMPLE

[GitHub] [hudi] codejoyan edited a comment on issue #2620: [SUPPORT] Performance Tuning: Slow stages (Building Workload Profile & Getting Small files from partitions) during Hudi Writes

2021-03-06 Thread GitBox
codejoyan edited a comment on issue #2620: URL: https://github.com/apache/hudi/issues/2620#issuecomment-791961553 Thanks @bvaradar and @nsivabalan. Please let me know how to improve the performance or if you need any further details to investigate. I used the below configurations (SIMPLE