Hi,

I have a use case where I'm creating a Tumbling window as follows:

"input" table has columns [Timestamp, a, b, c]

input \
    .window(Tumble.over(lit(10).seconds).on(input.Timestamp).alias("w")) \
    .group_by(col('w'), input.a) \
    .select(
        col('w').start.alias('window_start'),
        col('w').end.alias('window_end'),
        input.b,
        input.c.avg.alias('avg_value')) \
    .execute_insert('MySink') \
    .wait()

This throws an exception that it cannot resolve the fields "b" and "c"
inside the select statement. If I mention these column names inside the
group_by() statement as follows:

.group_by(col('w'), input.a, input.b, input.c)

then the column names in the subsequent select statement can be resolved.

Basically, unless the column name is explicitly made part of the group_by()
clause, the subsequent select() clause doesn't resolve it. This is very
similar to the example from Flink's documentation here [1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tableApi.html#overview--examples,
where a similar procedure works.

Any idea how I can access columns from the input stream, without having to
mention them in the group_by() clause? I really don't want to group the
results by those fields, but they need to be written to the sink eventually.

Thanks,
Sumeet

Reply via email to