HeartSaVioR commented on PR #41122: URL: https://github.com/apache/spark/pull/41122#issuecomment-1545192031
That's because we no longer use writebatch which has been problematic on memory usage. We should have probably run the benchmark and updated the result... The overall performance won't be significantly reduced as we pay the cost in each operation without writebatch which we are going to pay the cost at once in commit phase when we use writebatch & flush in commit phase. (We benchmarked by ourselves.) cc. @anishshri-db Would you mind adding more context here? We probably need to update the benchmark, with reduce of the number of operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
