[GitHub] [incubator-hudi] lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert

2020-03-25 Thread GitBox
lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much 
space in a temp folder while upsert
URL: https://github.com/apache/incubator-hudi/issues/1443#issuecomment-604171160
 
 
   > @lamber-ken this has come up in various tickets already..
   > @n3nash should we bump up the default merge size? what do you guys use at 
uber?
   
   right, we can  add it to Troubleshooting Guide.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert

2020-03-25 Thread GitBox
lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much 
space in a temp folder while upsert
URL: https://github.com/apache/incubator-hudi/issues/1443#issuecomment-603738602
 
 
   btw, when upsert large input data, hudi will spills part of input data to 
disk when reach the max memory for merge.
   if there is enough memory, you can increase `hoodie.memory.merge.max.size`, 
for example
   ```
   option("hoodie.memory.merge.max.size", "20048576")
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-hudi] lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much space in a temp folder while upsert

2020-03-25 Thread GitBox
lamber-ken commented on issue #1443: [SUPPORT] Spark-Hudi consumes too much 
space in a temp folder while upsert
URL: https://github.com/apache/incubator-hudi/issues/1443#issuecomment-603735853
 
 
   hello, @tverdokhlebd, from your above code, I guess you want do a bulk 
insert operation. 
   by default, hudi run on `upsert` mode, if you want to do a bulk upsert, need 
add this option
   
   ```
   .option("hoodie.datasource.write.operation", "bulk_insert")
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services