gaogaotiantian commented on code in PR #54085:
URL: https://github.com/apache/spark/pull/54085#discussion_r2771503764
##########
python/pyspark/sql/streaming/python_streaming_source_runner.py:
##########
@@ -116,6 +138,77 @@ def send_batch_func(
write_int(EMPTY_PYARROW_RECORD_BATCHES, outfile)
+def check_support_func(reader: DataSourceStreamReader, outfile: IO) -> None:
+ support_flags = 0
+ if isinstance(reader, _SimpleStreamReaderWrapper):
Review Comment:
I think because this is internal, we don't have to make it "better" in this
PR. One of the complaints we had for pyspark is that it's not pythonic. I'm not
sure how important it is to make the implementation comparable to scala's.
In any case, I don't think we should make `_SimpleStreamReaderWrapper` a
special case for every single place we need to deal with a
`DataSourceStreamReader` - it doesn't make sense. If we can't use mixin to make
`_SimpleStreamReaderWrapper` work, we should avoid it. Otherwise anytime we
want to add something to data source, we need to make it special. In the future
we might have another special "wrapper", and the world is going to blow up.
I think the most common way in Python is to make `DataSourceStreamReader` to
have a default `prepareForTriggerAvailableNow` that raises
`NotImplementedError` - you'll have a single API that you can implement in your
class and the code can deal with it. Or you can have a separate API
`supportTriggerAvailableNow` that returns `False` by default. Either way, we
shouldn't need to access `reader.simple_reader` explicitly here.
You are right - python is duck type, and we should treat is as a duck type
language.
However, I don't think you need to do it in this PR. We can refactor it
later. I'm just pointing out that having special cases for
`_SimpleStreamReaderWrapper` all over the code is a bit weird.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]