jerryshao commented on a change in pull request #25552: [SPARK-28849][CORE] Add
a number to control transferTo calls to avoid infinite loop in some occasional
cases
URL: https://github.com/apache/spark/pull/25552#discussion_r316951046
##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -417,16 +418,19 @@ private[spark] object Utils extends Logging {
input: FileChannel,
output: WritableByteChannel,
startPosition: Long,
- bytesToCopy: Long): Unit = {
+ bytesToCopy: Long,
+ numTransferToCalls: Int): Unit = {
val outputInitialState = output match {
case outputFileChannel: FileChannel =>
Some((outputFileChannel.position(), outputFileChannel))
case _ => None
}
var count = 0L
+ var num = 0
// In case transferTo method transferred less data than we have required.
- while (count < bytesToCopy) {
+ while (count < bytesToCopy && num < numTransferToCalls) {
Review comment:
Yes, in a situation, Spark cannot copy all the data anymore. I agree it is
quite hacky, but simply we don't want Spark to hang indefinitely and make it
fail fast.
This seems is not a Spark problem, but in a very big and elastic cluster,
the chance of meeting such problem may not be rare, and hard to detect in the
system level. It would be better to have a way to handle this in Spark side.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]