Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21306 @stczwd my understanding here is that a table isn't a streaming table or a batch table, but rather that a table points to data that can either be scanned in stream or in batch, and that the table is responsible for returning either streaming scanners or batch scanners when the logical plan calls for it. The reason why I believe this is the case is because of https://github.com/apache/spark/pull/23086/files#diff-d111d7e2179b55465840c9a81ea004f2R65 and its eventual analogous streaming variant. In the new abstractions we propose here and in [our proposal](https://docs.google.com/document/d/1uUmKCpWLdh9vHxP7AWJ9EgbwB_U6T3EJYNjhISGmiQg/edit), the catalog gets a reference to a `Table` object that can build `Scan`s over that table. In other words, the crucial overarching theme in all of the following matters is that a Table isn't inherently a streaming or a batch table, but rather a Table supports returning streaming and/or batch scans. The table returned by the catalog is a pointer to the data, and the Scan defines how one reads that data. > Source needs to be defined for stream table The catalog returns an instance of `Table` that can create `Scan`s that support the `toStream` method. > Stream table requires a special flags to indicate that it is a stream table. When one gets back a `Scan`, calling its `toStream` method will indicate that the table's data is about to be scanned in a streaming manner. > User and Program need to be aware of whether this table is a stream table. Probably would be done from the SQL code side. But not as certain about this, can you elaborate? > What would we do if the user wants to change the stream table to batch table or convert the batch table to stream table? The new abstraction handles this at the `Scan` level instead of the `Table` level. `Table`s are themselves not streamed or batched, but rather they construct scans that can read them in either stream or batch; the Scan implements `toBatch` and/or `toStream` to support the appropriate read method. > What does the stream table metadata you define look like? What is the difference between batch table metadata and batch table metadata? This I don't think is as clear given what has been proposed so far. Will let others offer comment here. Others should feel free to offer more commentary or correct anything from above.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org