[GitHub] [hudi] KarthickAN commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced

2020-10-19 Thread GitBox
KarthickAN commented on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-711645166 @nsivabalan @vinothchandar Thank you so much for all the explanations. If I think about it, having 10MB worth of index data may not be an issue as long as the file contains

[GitHub] [hudi] KarthickAN commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced

2020-10-16 Thread GitBox
KarthickAN commented on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-710747888 @nsivabalan I tried out Dynamic filter. It seems to be fine. It's growing along with the number of entries dynamically. That's a good feature. Thanks. However what's the

[GitHub] [hudi] KarthickAN commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced

2020-10-16 Thread GitBox
KarthickAN commented on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-709875380 @nsivabalan I did run some test around this issue. So I ran the job after setting the config hoodie.index.bloom.num_entries to 150 and inspected the file produced. There are

[GitHub] [hudi] KarthickAN commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced

2020-10-16 Thread GitBox
KarthickAN commented on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-709817321 @nsivabalan Please find below my answers 1. That's the average record size. I inspected the parquet files produced and calculated that based on the metrics I found there.