Yeah for this setup I used flintrock to start up a bunch of nodes with Spark and HDFS on AWS. I'm launching the pipeline on the master and all possible HDFS libraries I can think of are available and hdfs dfs commands work fine on the master and all the slaves. It's a problem of transparency I think where we can't see what going on, what's required and so on.
Thanks, Matt Op ma 28 jan. 2019 16:14 schreef Juan Carlos Garcia <[email protected]: > Matt is the machine from where you are launching the pipeline different > from where it should run? > > If that's the case make sure the machine used for launching has all the > hdfs environments variable set, as the pipeline is being configured in the > launching machine before it hit the worker machine. > > Good luck > JC > > > Am Mo., 28. Jan. 2019, 13:34 hat Matt Casters <[email protected]> > geschrieben: > >> Dear Beam friends, >> >> In preparation for my presentation of the Kettle Beam work in London next >> week I've been trying to get Spark Beam to run which worked in the end. >> The problem that resurfaced is however ... once again... back with a >> vengeance : >> >> java.lang.IllegalArgumentException: No filesystem found for scheme hdfs >> >> >> I configured HADOOP_HOME, HADOOP_CONF_DIR, ran >> FileSystems.FileSystems.setDefaultPipelineOptions(pipelineOptions), tried >> every trick in the book (very few of those are to be found) but it's a >> fairly brutal trial-and-error process. >> >> Given the fact that I'm not the only person hitting these issues I think >> it would be a good idea to allow for some sort of feedback of the >> FileSystems loading process, which filesystems it tries to load, which fail >> and so on. >> Also, the maven library situation is a bit fuzzy in the sense that there >> are libraries like beam-sdks-java-io-hdfs on a point release (0.6.0) as >> well as beam-sdks-java-io-hadoop-file-system on the latest version. >> >> I've been expanding my trial-and-error pattern to the endpoint and are >> ready to give up on Beam-on-Spark. I could try to get a Spark test >> environment configured for s3:// but I don't think it's all that >> representative of real-world scenarios. >> >> Thanks anyway in advance for any suggestions, >> >> Matt >> --- >> Matt Casters <m <[email protected]>[email protected]> >> Senior Solution Architect, Kettle Project Founder >> >> >>
