steveloughran commented on PR #37474: URL: https://github.com/apache/spark/pull/37474#issuecomment-1211699163
hey @HeartSaVioR. yes, this is exactly what the API we worked on was designed for. There is no need to initiate an MPU when writing small files; the OutputStream simply doesn't upload the data anymore. you can check this by calling toString() on the stream, all its IO stats there. This means that the cost is as normal; one PUT for data <= the block size, after that one POST to initiate, one POST per block and one POST in close() to finalize. block uploads are parallelised, though you do need enough https connection for this. It's no more expensive than normal write; upload performance will be the same. except when you call abort(), when it is faster. that said, let me review the code to confirm this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
