adrian-ionescu commented on a change in pull request #24175:
[SPARK-27232][SQL]Ignore file locality in InMemoryFileIndex if
spark.locality.wait is set to zero
URL: https://github.com/apache/spark/pull/24175#discussion_r269086518
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
##########
@@ -315,14 +319,19 @@ object InMemoryFileIndex extends Logging {
// which is very slow on some file system (RawLocalFileSystem, which
is launch a
// subprocess and parse the stdout).
try {
- val locations = fs.getFileBlockLocations(f, 0, f.getLen).map { loc =>
- // Store BlockLocation objects to consume less memory
- if (loc.getClass == classOf[BlockLocation]) {
- loc
+ val locations =
+ if (ignoreFileLocality) {
+ Array.empty[BlockLocation]
} else {
- new BlockLocation(loc.getNames, loc.getHosts, loc.getOffset,
loc.getLength)
+ fs.getFileBlockLocations(f, 0, f.getLen).map { loc =>
+ // Store BlockLocation objects to consume less memory
+ if (loc.getClass == classOf[BlockLocation]) {
Review comment:
This looks like a related, yet separate optimization. It's probably ok, but
I'm not sure it's safe to share these `BlockLocation` objects, given that
they're mutable. Do you know how much benefit this brings? Unless it's
significant, I wouldn't do it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]