[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

mccheah Thu, 29 Nov 2018 18:27:12 -0800

Github user mccheah commented on the issue:

    https://github.com/apache/spark/pull/21306
  
    @stczwd my understanding here is that a table isn't a streaming table or a 
batch table, but rather that a table points to data that can either be scanned 
in stream or in batch, and that the table is responsible for returning either 
streaming scanners or batch scanners when the logical plan calls for it. The 
reason why I believe this is the case is because of 
https://github.com/apache/spark/pull/23086/files#diff-d111d7e2179b55465840c9a81ea004f2R65
 and its eventual analogous streaming variant. In the new abstractions we 
propose here and in [our 
proposal](https://docs.google.com/document/d/1uUmKCpWLdh9vHxP7AWJ9EgbwB_U6T3EJYNjhISGmiQg/edit),
 the catalog gets a reference to a `Table` object that can build `Scan`s over 
that table.
    
    In other words, the crucial overarching theme in all of the following 
matters is that a Table isn't inherently a streaming or a batch table, but 
rather a Table supports returning streaming and/or batch scans. The table 
returned by the catalog is a pointer to the data, and the Scan defines how one 
reads that data.
    
    > Source needs to be defined for stream table
    
    The catalog returns an instance of `Table` that can create `Scan`s that 
support the `toStream` method.
    
    > Stream table requires a special flags to indicate that it is a stream 
table.
    
    When one gets back a `Scan`, calling its `toStream` method will indicate 
that the table's data is about to be scanned in a streaming manner.
    
    > User and Program need to be aware of whether this table is a stream table.
    
    Probably would be done from the SQL code side. But not as certain about 
this, can you elaborate?
    
    > What would we do if the user wants to change the stream table to batch 
table or convert the batch table to stream table?
    
    The new abstraction handles this at the `Scan` level instead of the `Table` 
level. `Table`s are themselves not streamed or batched, but rather they 
construct scans that can read them in either stream or batch; the Scan 
implements `toBatch` and/or `toStream` to support the appropriate read method.
    
    > What does the stream table metadata you define look like? What is the 
difference between batch table metadata and batch table metadata?
    
    This I don't think is as clear given what has been proposed so far. Will 
let others offer comment here.
    
    Others should feel free to offer more commentary or correct anything from 
above.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

Reply via email to