rubenssoto commented on issue #1878:
URL: https://github.com/apache/hudi/issues/1878#issuecomment-669431943
Thank you so much @bvaradar for your help
This is an automated message from the Apache Git Service.
To respond to
rubenssoto commented on issue #1878:
URL: https://github.com/apache/hudi/issues/1878#issuecomment-667675696
Bulk-insert do some deduplication?
This is an automated message from the Apache Git Service.
To respond to the
rubenssoto commented on issue #1878:
URL: https://github.com/apache/hudi/issues/1878#issuecomment-665432999
Hi bvaradar, how are you? I hope doing fine!
I have a new case, which is a little more important to me, the problem is
almost the same. I adopted the strategy to first batch
rubenssoto commented on issue #1878:
URL: https://github.com/apache/hudi/issues/1878#issuecomment-663927931
Hi Again.
When I changed the insert option to upsert the performance got worse.
1 Master Node m5.xlarge(4 vcpu, 16gb Ram)
1 Core Node r5.xlarge(4 vcpu, 32gb ram)
4
rubenssoto commented on issue #1878:
URL: https://github.com/apache/hudi/issues/1878#issuecomment-663906344
Hi bvaradar, thank you for your awnser.
I tried to increase spark.yarn.executor.memoryOverhead to 2GB with
foreachbatch option inside writestream and it worked. 4 nodes with 4
rubenssoto commented on issue #1878:
URL: https://github.com/apache/hudi/issues/1878#issuecomment-663806475
I tried resizing the cluster with 3 more nodes, so in total(4 nodes) after
resizing I had 4 cores in each node and 16gb of ram each, and it wasn't any
difference, the job keeps very