Hi Sasha,

you're right that if you want to access HDFS from the user code only it
should be possible to use the Hadoop free Flink version and bundle the
Hadoop dependencies with your user code. However, if you want to use
Flink's file system state backend as you did, then you have to start the
Flink cluster with the Hadoop dependency in its classpath. The reason is
that the FsStateBackend is part of the Flink distribution and will be
loaded using the system class loader.

One thing you could try out is to use the RocksDB state backend instead.
Since the RocksDBStateBackend is loaded dynamically, I think it should use
the Hadoop dependencies when trying to load the filesystem.

Cheers,
Till

On Tue, Jan 9, 2018 at 10:46 PM, Oleksandr Baliev <aleksanderba...@gmail.com
> wrote:

> Hello guys,
>
> want to clarify for myself: since flink 1.4.0 allows to use hadoop-free
> distribution and dynamic hadoop dependencies loading, I suppose that if to
> download hadoop-free distribution, start cluster without any hadoop and
> then load any job's jar which has some hadoop dependencies (i
> used 2.6.0-cdh5.10.1), hadoop should be visible in classpath and when start
> job which accesses hdfs via source/sink/etc. or making checkpoints can be
> run on such hadoop-free cluster.
>
> But when I start a job during config initialization for checkpoint I have
> "Hadoop is not in the classpath/dependencies.":
>
> org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not
> find a file system implementation for scheme 'hdfs'. The scheme is not
> directly supported by Flink and no Hadoop file system to support this
> scheme could be loaded.
> at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(F
> ileSystem.java:405)
> at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
> at org.apache.flink.core.fs.Path.getFileSystem(Path.java:293)
> at org.apache.flink.runtime.state.filesystem.FsCheckpointStream
> Factory.<init>(FsCheckpointStreamFactory.java:99)
> at org.apache.flink.runtime.state.filesystem.FsStateBackend.cre
> ateStreamFactory(FsStateBackend.java:277)
> ...
>
>
>  What I've found seems in org.apache.flink.core.fs.Fi
> leSystem#getUnguardedFileSystem in FS_FACTORIES there is no "hdfs" schema
> registered and FALLBACK_FACTORY which should be loaded with hadoop factory
> has org.apache.flink.core.fs.UnsupportedSchemeFactory but it loads when
> taskmanager is starting (when there should be no hadoop dependencies), so
> that should be ok.
>
> so as I understand hadoop file system is not recongnised by flink if it
> was not loaded at the beginning, is it correct or maybe I just messed up
> with something / somewhere?
>
> Thanks,
> Sasha
>

Reply via email to