Patrick Lucas created FLINK-33694:
-------------------------------------

             Summary: GCS filesystem does not respect gs.storage.root.url 
config option
                 Key: FLINK-33694
                 URL: https://issues.apache.org/jira/browse/FLINK-33694
             Project: Flink
          Issue Type: Bug
          Components: FileSystems
    Affects Versions: 1.17.2, 1.18.0
            Reporter: Patrick Lucas


The GCS FileSystem's RecoverableWriter implementation uses the GCS SDK directly 
rather than going through Hadoop. While support has been added to configure 
credentials correctly based on the standard Hadoop implementation 
configuration, no other options are passed through to the underlying client.

Because this only affects the RecoverableWriter-related codepaths, it can 
result in very surprising differing behavior whether the FileSystem is being 
used as a source or a sink—while a {{{}gs://{}}}-URI FileSource may work fine, 
a {{{}gs://{}}}-URI FileSink may not work at all.

We use [fake-gcs-server|https://github.com/fsouza/fake-gcs-server] in testing, 
and so we override the Hadoop GCS FileSystem config option 
{{{}gs.storage.root.url{}}}. However, because this option is not considered 
when creating the GCS client for the RecoverableWriter codepath, in a FileSink 
the GCS FileSystem attempts to write to the real GCS service rather than 
fake-gcs-server. At the same time, a FileSource works as expected, reading from 
fake-gcs-server.

The fix should be fairly straightforward, reading the {{gs.storage.root.url}} 
config option from the Hadoop FileSystem config in 
[{{GSFileSystemOptions}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemOptions.java#L30]
 and, if set, passing it to {{storageOptionsBuilder}} in 
[{{GSFileSystemFactory}}|https://github.com/apache/flink/blob/release-1.18.0/flink-filesystems/flink-gs-fs-hadoop/src/main/java/org/apache/flink/fs/gs/GSFileSystemFactory.java].

The only workaround for this is to build a custom flink-gs-fs-hadoop JAR with a 
patch and use it as a plugin.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to