Raghvendradubey commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-658571381
Thanks @vinothchandar for clarifications, will try GLOBAL_SIMPLE.
This is an automated message from the Apache
Raghvendradubey commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-655305119
Raghvendradubey commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-653368873
@vinothchandar Excellent, how can I try this async compaction?
I am attaching most expensive stages, I am not sure that Do I need to scale
cluster or I can lower this by some
Raghvendradubey commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-649925956
@vinothchandar I run the job with 5 min batch interval using MOR, now I can
see commit duration are 5 min and compaction is also 5 min, and updated records
are only 10% of
Raghvendradubey commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-647325340
sure.
I am trying to achieve near real time data( like Read Optimized View) by
updating records over S3.
eg -
let's say I have records
a1 b1 t1
a1, b2, t2
Raghvendradubey commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-646596361
Hey vinoth,
1 - Could you please some shed of light on statement "old behavior for real
production use-cases"?
2 - Yes Indexing is dominating, not sure why exactly it
Raghvendradubey commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-640738415
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
Raghvendradubey commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-638773847
On job countByKey at HoodieBloomindex, stage mapToPair at
HoodieWriteCLient.java:977 is taking longer time more than a minute, and stage
countByKey at HoodieBloomindex is
Raghvendradubey commented on issue #1694:
URL: https://github.com/apache/hudi/issues/1694#issuecomment-637414299
Hello Vinoth,
I was just playing with different combination of shuffle parallelism, I am
able to reduce countByKey at WorkloadProfile.java through shuffle partition by