Github user marmbrus commented on the issue:
https://github.com/apache/spark/pull/15102
Comparable requirement removed in #15207.
> I think in the absence of prior information about the position in a
topicpartition, you start a new batch on topic B starting from wherever the
consumer's position was at the time it acquired the subscription, which might
not be 0. I.e. you call position() before seekToEnd().
Why do you care when it acquired it? If it appeared in-between the the
last batch and now, don't you want to consume all of the available data from
it? Otherwise the answer is going to depend on the specifics on when you see
the topic, which seems counter to the model of Structured Streaming.
> I think the main thing that would be confusing is to specify topics in
one way (custom-delimited string) for one configuration, and in another way
(structured json) for another configuration.
Are you proposing users have to type `"[\"topic1\", \"topic2\"]` (or pull
in a json library) instead of `"topic1,topic2"`? Seems we could pretty
seamlessly add support for JSON in the future, while still making the common
case easy to type.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]