srowen commented on a change in pull request #26650: [CORE] Fix a bug in
getBlockHosts
URL: https://github.com/apache/spark/pull/26650#discussion_r349927646
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/PartitionedFileUtil.scala
##########
@@ -69,8 +69,8 @@ object PartitionedFileUtil {
b.getHosts -> (b.getOffset + b.getLength - offset).min(length)
// The fragment ends at a position within this block
- case b if offset <= b.getOffset && offset + length < b.getLength =>
- b.getHosts -> (offset + length - b.getOffset).min(length)
+ case b if offset <= b.getOffset && offset + length < b.getOffset +
b.getLength =>
Review comment:
This change looks correct.
I think there's another issue. This is checking where the end of the
argument block is, so it should look for where `offset + length` is relative to
b. Shouldn't the first condition be `offset + length >= b.getOffset`? Otherwise
this is handling the case where the argument doesn't overlap at all with b --
imagine offset is much smaller than b.getOffset. The result here could be
negative. I think that's masked by the fact that these are filtered for size >
0 below, but may mean this misses a better answer.
In this case, the argument isn't fully contained in b (that is handled in
the case above actually, by `.min(length)` -- might update the comment). Then
it's true that the `.min()` below is not needed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]