Hello everybody,

When I perform this simple set of queries, a unique line from the source file
shows up many times.

I have verified many times that a unique line in the source shows up as much as 
100 times in the select statement.

Is this the correct behavior for Flink 1.15.1?

FYI, it does show the correct results when I perform a DISTINCT query.

Here is the SQL:


CREATE TABLE historical_raw_source_template(
        `file.path`              STRING NOT NULL METADATA,
        `file.name`              STRING NOT NULL METADATA,
        `file.size`              BIGINT NOT NULL METADATA,
        `file.modification-time` TIMESTAMP_LTZ(3) NOT NULL METADATA,
        line                    STRING
      ) WITH (
        'connector' = 'filesystem',   -- required: specify the connector
        'format' = 'raw'              -- required: file system connector 
requires to specify a format
      );


CREATE TABLE historical_raw_source
      WITH (
        'path' = 's3://raw/'      -- required: path to a directory
      ) LIKE historical_raw_source_template;


SELECT
        `file.modification-time` AS modification_time,
        `file.path` AS file_path,
        line        
      FROM
          historical_raw_source

Reply via email to