Dear Beam friends, In preparation for my presentation of the Kettle Beam work in London next week I've been trying to get Spark Beam to run which worked in the end. The problem that resurfaced is however ... once again... back with a vengeance :
java.lang.IllegalArgumentException: No filesystem found for scheme hdfs I configured HADOOP_HOME, HADOOP_CONF_DIR, ran FileSystems.FileSystems.setDefaultPipelineOptions(pipelineOptions), tried every trick in the book (very few of those are to be found) but it's a fairly brutal trial-and-error process. Given the fact that I'm not the only person hitting these issues I think it would be a good idea to allow for some sort of feedback of the FileSystems loading process, which filesystems it tries to load, which fail and so on. Also, the maven library situation is a bit fuzzy in the sense that there are libraries like beam-sdks-java-io-hdfs on a point release (0.6.0) as well as beam-sdks-java-io-hadoop-file-system on the latest version. I've been expanding my trial-and-error pattern to the endpoint and are ready to give up on Beam-on-Spark. I could try to get a Spark test environment configured for s3:// but I don't think it's all that representative of real-world scenarios. Thanks anyway in advance for any suggestions, Matt --- Matt Casters <m <[email protected]>[email protected]> Senior Solution Architect, Kettle Project Founder
