Hello,

Currently working on creating a custom datasource using the Spark
Datasource API V2. On read, our datasource uses some temporary files in a
distributed store which we'd like to run some cleanup step on once the
entire operation is done. However, there does not seem to be anything
called in the API for an entire read being done, only the close() function
on individual PartitionReaders.

What I was looking for would be the equivalent to the commit() and abort()
functions in BatchWrite, but for the Scan or Batch class. I'm wondering if
there's any good way to achieve running something at the end of the read
operation using the current API? If not, I would ask if this might be a
useful addition, or if there are design reasons for not including such a
step.

Thanks,
Alex

Reply via email to