srowen commented on a change in pull request #25552: [SPARK-28849][CORE] Add a
number to control transferTo calls to avoid infinite loop in some occasional
cases
URL: https://github.com/apache/spark/pull/25552#discussion_r317008171
##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -417,16 +418,19 @@ private[spark] object Utils extends Logging {
input: FileChannel,
output: WritableByteChannel,
startPosition: Long,
- bytesToCopy: Long): Unit = {
+ bytesToCopy: Long,
+ numTransferToCalls: Int): Unit = {
val outputInitialState = output match {
case outputFileChannel: FileChannel =>
Some((outputFileChannel.position(), outputFileChannel))
case _ => None
}
var count = 0L
+ var num = 0
// In case transferTo method transferred less data than we have required.
- while (count < bytesToCopy) {
+ while (count < bytesToCopy && num < numTransferToCalls) {
count += input.transferTo(count + startPosition, bytesToCopy - count,
output)
Review comment:
I agree, the issue is making no progress at all right? Much better to check
if nothing is being transferred for a long while. You can set a large value.
Even if you reach it 'quickly', thousands of 0-byte transfers indicate a
problem and you want to fail right?
The big big difference here is: say you cap the number of transfers at
10000, and you are trying to move 1GB, and it's just transferring 10K at a
time. You would fail to write most of the data!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]