I filed a few jiras to track the follow-up work we discussed here:
BEAM-9208 [1] - Add support for mapping columns to pubsub message
attributes in flat schemas DDL
BEAM-9209 [2] - Add support for mapping columns to pubsub message
event_timestamp when using flat schemas DDL
BEAM-9210 [3] - Deprecat
A PR is up here [1].
Gleb: If I understand what you're saying, I think it's already implemented
the way you're describing - PubsubIOJsonTable [2] is just a thin wrapper
that connects PubsubIO with Beam SQL tables.
Alex/Kenn: I agree with everything you've said :) The hard-coded
event_timestamp is
I like Alex's syntax suggestion. Very readable. In addition to tables
defined via DDL, we also have a metastore abstraction that currently
supports Hive Metastore and Google's Data Catalog. We should think about
how something like what Alex describes could be served by these systems.
Kenn
On Sun,
+1 to reduced boiler plate for basic things folks want to do with SQL.
I like Alex use of Option for more advanced use cases.
On Sun, 17 Nov 2019 at 20:17, Gleb Kanterov wrote:
> Expanding on what Kenn said regarding having fewer dependencies on SQL.
> Can the whole thing be seen as extending P
Expanding on what Kenn said regarding having fewer dependencies on SQL. Can
the whole thing be seen as extending PubSubIO, that would implement most of
the logic from the proposal, given column annotations, and then having a
thin layer that connects it with Beam SQL tables?
On Sun, Nov 17, 2019 at
I like it, but I'm worried about the magic event_timestamp injection.
Wouldn't explicit injection via option not be a better approach:
CREATE TABLE people (
my_timestamp TIMESTAMP *OPTION(ref="pubsub:event_timestamp)*,
my_id VARCHAR *OPTION(ref="pubsub:attributes['id_name']")*,
name VA
Big +1 from me.
Nice explanation. This makes a lot of sense. Much simpler to understand
with fewer magic strings. It also makes the Beam SQL connector less
dependent on newer SQL features that are simply less widespread. I'm not
too surprised that Calcite's nested row support lags behind the rest
I've been looking into adding support for writing (i.e. INSERT INTO
statements) for the pubsub DDL, which currently only supports reading. This
DDL requires the defined schema to have exactly three fields:
event_timestamp, attributes, and payload, corresponding to the fields in
PubsubMessage (event