jerryshao commented on a change in pull request #25552: [SPARK-28849][CORE] Add
a number to control transferTo calls to avoid infinite loop in some occasional
cases
URL: https://github.com/apache/spark/pull/25552#discussion_r316954358
##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -417,16 +418,19 @@ private[spark] object Utils extends Logging {
input: FileChannel,
output: WritableByteChannel,
startPosition: Long,
- bytesToCopy: Long): Unit = {
+ bytesToCopy: Long,
+ numTransferToCalls: Int): Unit = {
val outputInitialState = output match {
case outputFileChannel: FileChannel =>
Some((outputFileChannel.position(), outputFileChannel))
case _ => None
}
var count = 0L
+ var num = 0
// In case transferTo method transferred less data than we have required.
- while (count < bytesToCopy) {
+ while (count < bytesToCopy && num < numTransferToCalls) {
count += input.transferTo(count + startPosition, bytesToCopy - count,
output)
Review comment:
While in a such infinite loop, it is quite fast to reach to a large number,
say 10000 times of `transferTo` call. We don't need to set a small value, a
large value like 1w+ is enough to cover most of the scenario and fail fast.
Also it is not so good to add wait here, it will delay the shuffle write
critical path, and in a such scenario adding delay will not recover anything,
only just lower the system usage.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]