[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-07-15 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-658571381 Thanks @vinothchandar for clarifications, will try GLOBAL_SIMPLE. This is an automated message from the Apache

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-07-08 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-655305119

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-07-03 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-653368873 @vinothchandar Excellent, how can I try this async compaction? I am attaching most expensive stages, I am not sure that Do I need to scale cluster or I can lower this by some

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-25 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-649925956 @vinothchandar I run the job with 5 min batch interval using MOR, now I can see commit duration are 5 min and compaction is also 5 min, and updated records are only 10% of

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-22 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-647325340 sure. I am trying to achieve near real time data( like Read Optimized View) by updating records over S3. eg - let's say I have records a1 b1 t1 a1, b2, t2

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-19 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-646596361 Hey vinoth, 1 - Could you please some shed of light on statement "old behavior for real production use-cases"? 2 - Yes Indexing is dominating, not sure why exactly it

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-09 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-640738415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-04 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-638773847 On job countByKey at HoodieBloomindex, stage mapToPair at HoodieWriteCLient.java:977 is taking longer time more than a minute, and stage countByKey at HoodieBloomindex is

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-02 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-637414299 Hello Vinoth, I was just playing with different combination of shuffle parallelism, I am able to reduce countByKey at WorkloadProfile.java through shuffle partition by