Hi Matwey,

I think you can customize an inputFormat to meet your needs. And use the
FileSource::forBulkFileFormat interface to create a FileSource;

In the custom inputFormat, you can choose to only read the metadata of the
file without reading its content.


https://github.com/apache/flink/blob/1dac395967e5870833d67c6bf1103ba874fce601/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/FileSource.java#L171

    public static <T> FileSourceBuilder<T> forBulkFileFormat(
            final BulkFormat<T, FileSourceSplit> bulkFormat, final Path...
paths) {
        checkNotNull(bulkFormat, "reader");
        checkNotNull(paths, "paths");
        checkArgument(paths.length > 0, "paths must not be empty");

        return new FileSourceBuilder<>(paths, bulkFormat);
    }


Best,
Feng


On Tue, Dec 5, 2023 at 8:43 PM Matwey V. Kornilov <matwey.korni...@gmail.com>
wrote:

> Hello,
>
> I have an S3 bucket and I would like to process the objects metainfo
> (such as keys (filenames), metainfo, tags, etc.).
> I don't care about the objects content since it is irrelevant for my
> task. What I want is to construct a data stream where each instance is a
> metainfo attached to some object from the bucket.
>
> Is it anyhow possible to tune and reuse the FileSystem connector for my
> purposes? The connector is provided to read content of files, while I
> would like to read content of directory, or metainfo for every file.
>
>

Reply via email to