srowen commented on a change in pull request #25552: [SPARK-28849][CORE] Add a
number to control transferTo calls to avoid infinite loop in some occasional
cases
URL: https://github.com/apache/spark/pull/25552#discussion_r317130417
##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -417,16 +418,19 @@ private[spark] object Utils extends Logging {
input: FileChannel,
output: WritableByteChannel,
startPosition: Long,
- bytesToCopy: Long): Unit = {
+ bytesToCopy: Long,
+ numTransferToCalls: Int): Unit = {
val outputInitialState = output match {
case outputFileChannel: FileChannel =>
Some((outputFileChannel.position(), outputFileChannel))
case _ => None
}
var count = 0L
+ var num = 0
// In case transferTo method transferred less data than we have required.
- while (count < bytesToCopy) {
+ while (count < bytesToCopy && num < numTransferToCalls) {
count += input.transferTo(count + startPosition, bytesToCopy - count,
output)
Review comment:
Can you log the return of `transferTo` "locally" in some way to further
test? Not sure how easy it is to reproduce. In the absence of evidence, I think
checking <= 0 is the most that is safe. I think it bears understanding the
issue more first before any change is made.
In any event I don't think you can ever just stop copying data and continue
successfully. You would have to fail the job.
If the job is merely slow, at best, you'd want to introduce some kind of
timeout at a higher level as there are all kinds of ways it can be slow. But
that would also take care of your current situation.
But you can always kill jobs that are slow directly, without any change to
Spark.
This doesn't fix the cause anyway (assuming it's outside Spark).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]