peter-toth commented on a change in pull request #24175:
[SPARK-27232][SQL]Ignore file locality in InMemoryFileIndex if
spark.locality.wait is set to zero
URL: https://github.com/apache/spark/pull/24175#discussion_r269092692
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
##########
@@ -315,14 +319,19 @@ object InMemoryFileIndex extends Logging {
// which is very slow on some file system (RawLocalFileSystem, which
is launch a
// subprocess and parse the stdout).
try {
- val locations = fs.getFileBlockLocations(f, 0, f.getLen).map { loc =>
- // Store BlockLocation objects to consume less memory
- if (loc.getClass == classOf[BlockLocation]) {
- loc
+ val locations =
+ if (ignoreFileLocality) {
+ Array.empty[BlockLocation]
} else {
- new BlockLocation(loc.getNames, loc.getHosts, loc.getOffset,
loc.getLength)
+ fs.getFileBlockLocations(f, 0, f.getLen).map { loc =>
+ // Store BlockLocation objects to consume less memory
+ if (loc.getClass == classOf[BlockLocation]) {
Review comment:
This part doesn't change in this PR. The new thing here is that we don't
look up the block locations, but return an empty array instead.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]