I've been playing around with CREATE EXTERNAL TABLE (using a custom TableProvider as well) w/ BeamSQL and really love it. I have a few questions though that I've accumulated as I've been using it I wanted to ask.
- I'm a little confused about the need to define columns in the CREATE EXTERNAL TABLE statement. If I have a BeamSqlTable implementation that can provide the schema on its own, it seems like the columns supplied to the CREATE statement are ignored. This is ideal anyways, since it's infeasible for users to provide the entire schema up-front, especially for more complicated sources. Should the column list be optional here instead? - It seems like predicate pushdown only works if the schema is "flat" (has no nested rows). I understand the complication in pushing down more complicated nested predicates, however, assuming the table implementation doesn't actually attempt to push them down, it seems like it would be fine to allow? - As a follow up on the above, I'd like to expose a "virtual" field in my schema that represents the partition the data has come from. For example BigQuery has a similar concept called _PARTITIONTIME. This would be picked up by the predicate pushdown and used to filter the partitions being read. I can't really figure out how I'd construct something similar here, even if pushdown worked in all cases. For example, for this query: SELECT * from table where _PARTITIONTIME between X and Y I'd want that filter to be pushed down to my IO, but also the _PARTITIONTIME column wouldn't be returned in the select list. I was hoping to use BigQueryIO as an example of how to do this, but it doesn't seem like it exposes the virtual _PARTITIONTIME column either.
