Github user mccheah commented on the issue:
https://github.com/apache/spark/pull/21306
@stczwd my understanding here is that a table isn't a streaming table or a
batch table, but rather that a table points to data that can either be scanned
in stream or in batch, and that the table is responsible for returning either
streaming scanners or batch scanners when the logical plan calls for it. The
reason why I believe this is the case is because of
https://github.com/apache/spark/pull/23086/files#diff-d111d7e2179b55465840c9a81ea004f2R65
and its eventual analogous streaming variant. In the new abstractions we
propose here and in [our
proposal](https://docs.google.com/document/d/1uUmKCpWLdh9vHxP7AWJ9EgbwB_U6T3EJYNjhISGmiQg/edit),
the catalog gets a reference to a `Table` object that can build `Scan`s over
that table.
In other words, the crucial overarching theme in all of the following
matters is that a Table isn't inherently a streaming or a batch table, but
rather a Table supports returning streaming and/or batch scans. The table
returned by the catalog is a pointer to the data, and the Scan defines how one
reads that data.
> Source needs to be defined for stream table
The catalog returns an instance of `Table` that can create `Scan`s that
support the `toStream` method.
> Stream table requires a special flags to indicate that it is a stream
table.
When one gets back a `Scan`, calling its `toStream` method will indicate
that the table's data is about to be scanned in a streaming manner.
> User and Program need to be aware of whether this table is a stream table.
Probably would be done from the SQL code side. But not as certain about
this, can you elaborate?
> What would we do if the user wants to change the stream table to batch
table or convert the batch table to stream table?
The new abstraction handles this at the `Scan` level instead of the `Table`
level. `Table`s are themselves not streamed or batched, but rather they
construct scans that can read them in either stream or batch; the Scan
implements `toBatch` and/or `toStream` to support the appropriate read method.
> What does the stream table metadata you define look like? What is the
difference between batch table metadata and batch table metadata?
This I don't think is as clear given what has been proposed so far. Will
let others offer comment here.
Others should feel free to offer more commentary or correct anything from
above.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]