Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148225333
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ReadTask.java ---
@@ -37,13 +37,19 @@
* The preferred locations where this read task can run faster, but
Spark does not guarantee that
* this task will always run on these locations. The implementations
should make sure that it can
* be run on any location. The location is a string representing the
host name of an executor.
+ *
+ * If an exception was thrown, the action would fail and we guarantee
that no Spark job was
+ * submitted.
*/
default String[] preferredLocations() {
return new String[0];
--- End diff --
Somewhere in the spark code there's that translation of localhost ->
anywhere, so that things like object stores who return "localhost" end up with
independent-of-location placement. Is this required to have happened by the
time `preferredLocations` is called, or can an impl return "localhost" and
expect spark to deal with it. Purest would be to say "stores which use
'localhost' as a hint to mean 'unplaced' are required to have filtered it out
by this point'
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]