When the query changed to

SELECT user_id, ts, rownum
FROM (
  SELECT user_id, ts, ROW_NUMBER() OVER (PARTITION BY window_start,
window_end, user_id ORDER BY ts ASC) as rownum
  FROM TABLE(
    TUMBLE(TABLE visits, DESCRIPTOR(ts), INTERVAL '24' HOURS))
)
WHERE rownum = 1

runs but doesn't produce any results. I've tried different window sizes,
the source file is sorted by timestamp.

On Thu, Nov 4, 2021 at 1:42 PM Francesco Guardiani <france...@ververica.com>
wrote:

> I think the issue here is that the nested select is selecting all the
> fields produced by the TVF, including window_time (which is implicitly
> added by the TVF as described here
> <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-tvf/#window-functions>).
> Because of that, the planner cannot resolve the timestamp to use as
> event-time in the result stream. Try to select only the fields you need in
> the nested SELECT, e.g.:
>
> SELECT *
> FROM (
>   SELECT user_id, ts, ROW_NUMBER() OVER (PARTITION BY window_start,
> window_end, user_id ORDER BY ts ASC) as rownum
>   FROM TABLE(
>     TUMBLE(TABLE visits, DESCRIPTOR(ts), INTERVAL '24' HOURS))
> )
> WHERE rownum = 1
>
> On Thu, Nov 4, 2021 at 11:18 AM Pavel Penkov <ebonfortr...@gmail.com>
> wrote:
>
>> I'm trying to express a supposedly simple query with Flink SQL - log the
>> first visit a day for each user. Source table is defined like
>>
>> CREATE TABLE visits (user_id int, ts timestamp(3), WATERMARK FOR ts AS ts)
>> WITH ('connector' = 'filesystem',
>> 'path' = 'file:///visits.csv',
>> 'format' = 'csv')
>>
>> The query I came with up is
>>
>> SELECT *
>> FROM (
>>   SELECT *, ROW_NUMBER() OVER (PARTITION BY window_start, window_end,
>> user_id ORDER BY ts ASC) as rownum
>>   FROM TABLE(
>>     TUMBLE(TABLE visits, DESCRIPTOR(ts), INTERVAL '24' HOURS))
>> )
>> WHERE rownum = 1
>>
>> But it fails with error
>> [ERROR] Could not execute SQL statement. Reason:
>> org.apache.flink.table.api.TableException: Found more than one rowtime
>> field: [ts, window_time] in the query when insert into
>> 'default_catalog.default_database.Unregistered_Collect_Sink_6'.
>> Please select the rowtime field that should be used as event-time
>> timestamp for the DataStream by casting all other fields to TIMESTAMP.
>>
>> Any ideas on how to fix this?
>>
>

Reply via email to