Github user steveloughran commented on a diff in the pull request:
https://github.com/apache/spark/pull/19623#discussion_r148227459
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataReader.java
---
@@ -34,11 +34,17 @@
/**
* Proceed to next record, returns false if there is no more records.
+ *
+ * If an exception was thrown, the corresponding Spark task would fail
and get retried until
+ * hitting the maximum retry times.
*/
boolean next();
/**
* Return the current record. This method should return same value until
`next` is called.
+ *
--- End diff --
-assuming that the source data is not changed in any way. I'd avoid making
any comments about what happens then. Maybe make that a broader requirement
upfront: Spark assumes that the data does not change in size/structure/value
during the query. If it does, any operation may raise an exception or return
invalid/inconsistent data.
That makes for a nice disclaimer: if it's a database with the right ACID
level, updates to a source may not be visible. If it's a CSV file, nobody knows
what will happen.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]