Hello everybody,
When I perform this simple set of queries, a unique line from the source file shows up many times. I have verified many times that a unique line in the source shows up as much as 100 times in the select statement. Is this the correct behavior for Flink 1.15.1? FYI, it does show the correct results when I perform a DISTINCT query. Here is the SQL: CREATE TABLE historical_raw_source_template( `file.path` STRING NOT NULL METADATA, `file.name` STRING NOT NULL METADATA, `file.size` BIGINT NOT NULL METADATA, `file.modification-time` TIMESTAMP_LTZ(3) NOT NULL METADATA, line STRING ) WITH ( 'connector' = 'filesystem', -- required: specify the connector 'format' = 'raw' -- required: file system connector requires to specify a format ); CREATE TABLE historical_raw_source WITH ( 'path' = 's3://raw/' -- required: path to a directory ) LIKE historical_raw_source_template; SELECT `file.modification-time` AS modification_time, `file.path` AS file_path, line FROM historical_raw_source