Hi
I'm experimenting with Dataflow SQL streaming extension and I observed that
the event_timestamp field in the payload is ignored.
I would like to calculate the average value of the values reported by the
sensor every 5 seconds.
SELECT CURRENT_TIMESTAMP() AS created_at, * FROM
(SELECT
s1.window_start AS window_start,
s1.window_end AS window_end,
MIN(event_timestamp) AS event_timestamp_min,
MAX(event_timestamp) AS event_timestamp_max,
AVG(s1.sensor_value) AS sensor_value_avg,
FROM TUMBLE(
(SELECT * FROM
pubsub.topic.`sc-9366-nga-dev`.`marcin-atm22-signal-1`),
DESCRIPTOR(event_timestamp),
"INTERVAL 5 SECOND"
) as s1
GROUP BY window_start, window_end)
For testing purposes sensor data is artificially generated, and
event_timestamp is always 30 seconds behind current time.
current timestamp: 2022-08-05T15:00:24+00:00
{'event_timestamp': '2022-08-05T14:59:54+00:00', 'sensor_value':
0.4083962116009032}
But I get the following result at 15:00:28 (the latest row stored in BQ) :
[{
"created_at": "2022-08-05T15:00:20.170Z",
"window_start": "2022-08-05T15:00:05Z",
"window_end": "2022-08-05T15:00:10Z",
"event_timestamp_min": "2022-08-05T15:00:05.019Z",
"event_timestamp_max": "2022-08-05T15:00:09.035Z",
"sensor_value_avg": "0.1612730883"
}]
Why is there a record created at 15:00:20 with a window 15:00:05-15:00:10
if the input event_time is always delayed by 30 seconds? At 15:00:20 the
latest emitted sensor event_timestamp is ~ 14:59:50.
Moreover the watermark lag reported by dataflow is always 10-20 seconds,
even if the event_timestamp reported by the sensor is far behind the
wallclock.
Any ideas?
Regards,
Marcin