Spark progress feedback

Matt Casters Mon, 28 Jan 2019 04:34:25 -0800

Dear Beam friends,

In preparation for my presentation of the Kettle Beam work in London next
week I've been trying to get Spark Beam to run which worked in the end.
The problem that resurfaced is however ... once again... back with a
vengeance :


java.lang.IllegalArgumentException: No filesystem found for scheme hdfs


I configured HADOOP_HOME, HADOOP_CONF_DIR, ran
FileSystems.FileSystems.setDefaultPipelineOptions(pipelineOptions), tried
every trick in the book (very few of those are to be found) but it's a
fairly brutal trial-and-error process.

Given the fact that I'm not the only person hitting these issues I think it
would be a good idea to allow for some sort of feedback of the FileSystems
loading process, which filesystems it tries to load, which fail and so on.
Also, the maven library situation is a bit fuzzy in the sense that there
are libraries like beam-sdks-java-io-hdfs on a point release (0.6.0) as
well as beam-sdks-java-io-hadoop-file-system on the latest version.

I've been expanding my trial-and-error pattern to the endpoint and are
ready to give up on Beam-on-Spark.  I could try to get a Spark test
environment configured for s3:// but I don't think it's all that
representative of real-world scenarios.

Thanks anyway in advance for any suggestions,

Matt
---
Matt Casters <m <[email protected]>[email protected]>
Senior Solution Architect, Kettle Project Founder

Spark progress feedback

Reply via email to