Accessing columns from input stream table during Window operations

Sumeet Malhotra Sun, 18 Apr 2021 23:24:26 -0700

Hi,

I have a use case where I'm creating a Tumbling window as follows:


"input" table has columns [Timestamp, a, b, c]

input \
    .window(Tumble.over(lit(10).seconds).on(input.Timestamp).alias("w")) \
    .group_by(col('w'), input.a) \
    .select(
        col('w').start.alias('window_start'),
        col('w').end.alias('window_end'),
        input.b,
        input.c.avg.alias('avg_value')) \
    .execute_insert('MySink') \
    .wait()

This throws an exception that it cannot resolve the fields "b" and "c"
inside the select statement. If I mention these column names inside the
group_by() statement as follows:

.group_by(col('w'), input.a, input.b, input.c)

then the column names in the subsequent select statement can be resolved.

Basically, unless the column name is explicitly made part of the group_by()
clause, the subsequent select() clause doesn't resolve it. This is very
similar to the example from Flink's documentation here [1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tableApi.html#overview--examples,
where a similar procedure works.

Any idea how I can access columns from the input stream, without having to
mention them in the group_by() clause? I really don't want to group the
results by those fields, but they need to be written to the sink eventually.

Thanks,
Sumeet

Accessing columns from input stream table during Window operations

Reply via email to