[GitHub] [incubator-hudi] lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert
lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert URL: https://github.com/apache/incubator-hudi/issues/1443#issuecomment-604171160 > @lamber-ken this has come up in various tickets already.. > @n3nash should we bump up the default merge size? what do you guys use at uber? right, we can add it to Troubleshooting Guide. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert
lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert URL: https://github.com/apache/incubator-hudi/issues/1443#issuecomment-603738602 btw, when upsert large input data, hudi will spills part of input data to disk when reach the max memory for merge. if there is enough memory, you can increase `hoodie.memory.merge.max.size`, for example ``` option("hoodie.memory.merge.max.size", "20048576") ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert
lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert URL: https://github.com/apache/incubator-hudi/issues/1443#issuecomment-603735853 hello, @tverdokhlebd, from your above code, I guess you want do a bulk insert operation. by default, hudi run on `upsert` mode, if you want to do a bulk upsert, need add this option ``` .option("hoodie.datasource.write.operation", "bulk_insert") ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services