Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20752
I'm not very familiar with the streaming side, but here is my 2 cents: I
agree with @rdblue that it's unnecessary to introduce the epoch id to data
sources that don't care about streaming. However, I think it's natural to say
that batch data source is a special case of streaming data source and only
needs to deal with one epoch.
So it's a tradeoff, do we wanna make it easier to implement a batch data
source, or do we wanna make it easier to implement a data source supports both
batch and streaming?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]