[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-29 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-665040109 @bvaradar so even if I change the partition such that I have a different partition per day for different datasets so that only one write happens in the partition does it still

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-22 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-662709245 @bvaradar mostly I see : org.apache.hudi.exception.HoodieRollbackException: Found in-flight commits  after time :20200722052838, please rollback greater commits first Does

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-22 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-662612691 @bvaradar you are suggesting look at the spark logs during ingestion or any other logs? This is an automated

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-22 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-662530755 @bvaradar the content of .hoodie is listed at https://gist.github.com/asheeshgarg/8897de60ab6ba78b5847f5432a4a69dd

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-21 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-661874341 @bvaradar so the insert are looking fine now the COW compaction is generating 2 parquet file for each date. I also set the following properties "hoodie.keep.min.commits":

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-20 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-661195131 @bvaradar thanks Balaji for your continuous support will test this. This is an automated message from the Apache

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-17 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-660174273 @bvaradar I think some how there was a cleanup issue after cleanup all the files and setting "hoodie.cleaner.commits.retained":1, I see two parquet files consistently so this

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-16 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-659558896 @bvaradar Balaji I set the hoodie.cleaner.commits.retained:1 after that I see only two parquet in the filesystem. But when I load the partition using the spark I don't see all the

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-15 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-659064590 @bvaradar I was assuming that every time we write the content will merged to the existing file based on the size limits we have specify. Other wise we will see lot small files. As

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-15 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-659001436 @bvaradar Balaji I tried the mentioned property but doesn't see the impact still see parquet generated 2020-07-15 20:41:40 478.6 KiB

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-15 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-658881320 @bvaradar I run with the above understanding where I set the small file size limit to 500 MB to match the 500 datasets but after write I see no change in the behavior it still

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-15 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-658837587 @bvaradar Thanks for quick response Balaji. To understand it correctly let me quickly run with an example The data that is generated for a dataset will be in some range of 1MB

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-14 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-658422907 @bvaradar you are right we are looking for clustering. Do you have anytime line in mind when this will be available or any branch to look at.

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-14 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-658188686 @bvaradar Balaji please let me know if I need to assign additional properties to achieve the behavior. This is an