[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

rdblue Fri, 30 Nov 2018 11:43:53 -0800

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @stczwd, I agree with @mccheah. Tables are basically named data sets. 
Whether they support batch, micro-batch streaming, or continuous streaming is 
determined by checking whether they implement SupportsBatchScan or similar 
interfaces. Matt's referenced docs are the right place to go for more context. 
The purpose here is to make catalogs and reads orthogonal. A catalog can return 
both batch-compatible and stream-compatible source "tables".
    
    A "table" may be a Kafka topic or may be a file-based data source. And note 
that both of those can support batch and streaming execution. A Kafka topic 
could be CDC stream that represents a table, and a file-based source could be 
streamed by periodically checking for new committed files.
    
    This PR is based on an 
[SPIP](https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#heading=h.7vhjx9226jbt).
 That has some background for why I chose the set of table attributes here 
(schema, partitioning, properties), but a short summary is that those are the 
core set of attributes that are  used in comparable SQL variants and already 
used in Spark.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Reply via email to