Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21306
@stczwd, I agree with @mccheah. Tables are basically named data sets.
Whether they support batch, micro-batch streaming, or continuous streaming is
determined by checking whether they implement SupportsBatchScan or similar
interfaces. Matt's referenced docs are the right place to go for more context.
The purpose here is to make catalogs and reads orthogonal. A catalog can return
both batch-compatible and stream-compatible source "tables".
A "table" may be a Kafka topic or may be a file-based data source. And note
that both of those can support batch and streaming execution. A Kafka topic
could be CDC stream that represents a table, and a file-based source could be
streamed by periodically checking for new committed files.
This PR is based on an
[SPIP](https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.7vhjx9226jbt).
That has some background for why I chose the set of table attributes here
(schema, partitioning, properties), but a short summary is that those are the
core set of attributes that are used in comparable SQL variants and already
used in Spark.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]