Hello guys,

want to clarify for myself: since flink 1.4.0 allows to use hadoop-free
distribution and dynamic hadoop dependencies loading, I suppose that if to
download hadoop-free distribution, start cluster without any hadoop and
then load any job's jar which has some hadoop dependencies (i
used 2.6.0-cdh5.10.1), hadoop should be visible in classpath and when start
job which accesses hdfs via source/sink/etc. or making checkpoints can be
run on such hadoop-free cluster.

But when I start a job during config initialization for checkpoint I have
"Hadoop is not in the classpath/dependencies.":

org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not
find a file system implementation for scheme 'hdfs'. The scheme is not
directly supported by Flink and no Hadoop file system to support this
scheme could be loaded.
at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(
FileSystem.java:405)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:293)
at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory.<
init>(FsCheckpointStreamFactory.java:99)
at org.apache.flink.runtime.state.filesystem.FsStateBackend.
createStreamFactory(FsStateBackend.java:277)
...


 What I've found seems in
org.apache.flink.core.fs.FileSystem#getUnguardedFileSystem
in FS_FACTORIES there is no "hdfs" schema registered and FALLBACK_FACTORY
which should be loaded with hadoop factory has
org.apache.flink.core.fs.UnsupportedSchemeFactory
but it loads when taskmanager is starting (when there should be no hadoop
dependencies), so that should be ok.

so as I understand hadoop file system is not recongnised by flink if it was
not loaded at the beginning, is it correct or maybe I just messed up with
something / somewhere?

Thanks,
Sasha

Reply via email to