asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-665040109
@bvaradar so even if I change the partition such that I have a different
partition per day for different datasets so that only one write happens in the
partition does it still
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-662709245
@bvaradar mostly I see
: org.apache.hudi.exception.HoodieRollbackException: Found in-flight commits
after time :20200722052838, please rollback greater commits first
Does
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-662612691
@bvaradar you are suggesting look at the spark logs during ingestion or any
other logs?
This is an automated
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-662530755
@bvaradar the content of .hoodie is listed at
https://gist.github.com/asheeshgarg/8897de60ab6ba78b5847f5432a4a69dd
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-661874341
@bvaradar so the insert are looking fine now the COW compaction is
generating 2 parquet file for each date.
I also set the following properties
"hoodie.keep.min.commits":
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-661195131
@bvaradar thanks Balaji for your continuous support will test this.
This is an automated message from the Apache
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-660174273
@bvaradar I think some how there was a cleanup issue after cleanup all the
files and setting
"hoodie.cleaner.commits.retained":1, I see two parquet files consistently so
this
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-659558896
@bvaradar Balaji I set the hoodie.cleaner.commits.retained:1 after that I
see only two parquet in the filesystem. But when I load the partition using the
spark I don't see all the
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-659064590
@bvaradar I was assuming that every time we write the content will merged to
the existing file based on the size limits we have specify. Other wise we will
see lot small files. As
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-659001436
@bvaradar Balaji I tried the mentioned property but doesn't see the impact
still see parquet generated
2020-07-15 20:41:40 478.6 KiB
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-658881320
@bvaradar I run with the above understanding where I set the small file size
limit to 500 MB to match the 500 datasets but after write I see no change in
the behavior it still
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-658837587
@bvaradar Thanks for quick response Balaji. To understand it correctly let
me quickly run with an example
The data that is generated for a dataset will be in some range of 1MB
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-658422907
@bvaradar you are right we are looking for clustering. Do you have anytime
line in mind when this will be available or any branch to look at.
asheeshgarg commented on issue #1825:
URL: https://github.com/apache/hudi/issues/1825#issuecomment-658188686
@bvaradar Balaji please let me know if I need to assign additional
properties to achieve the behavior.
This is an
14 matches
Mail list logo