Re: Spark progress feedback

Matt Casters Mon, 28 Jan 2019 08:34:19 -0800

Yeah for this setup I used flintrock to start up a bunch of nodes with
Spark and HDFS on AWS. I'm launching the pipeline on the master and all
possible HDFS libraries I can think of are available and hdfs dfs commands
work fine on the master and all the slaves.
It's a problem of transparency I think where we can't see what going on,
what's required and so on.


Thanks,

Matt

Op ma 28 jan. 2019 16:14 schreef Juan Carlos Garcia <[email protected]:

> Matt is the machine from where you are launching the pipeline different
> from where it should run?
>
> If that's the case make sure the machine used for launching has all the
> hdfs environments variable set, as the pipeline is being configured in the
> launching machine before it hit the worker machine.
>
> Good luck
> JC
>
>
> Am Mo., 28. Jan. 2019, 13:34 hat Matt Casters <[email protected]>
> geschrieben:
>
>> Dear Beam friends,
>>
>> In preparation for my presentation of the Kettle Beam work in London next
>> week I've been trying to get Spark Beam to run which worked in the end.
>> The problem that resurfaced is however ... once again... back with a
>> vengeance :
>>
>> java.lang.IllegalArgumentException: No filesystem found for scheme hdfs
>>
>>
>> I configured HADOOP_HOME, HADOOP_CONF_DIR, ran
>> FileSystems.FileSystems.setDefaultPipelineOptions(pipelineOptions), tried
>> every trick in the book (very few of those are to be found) but it's a
>> fairly brutal trial-and-error process.
>>
>> Given the fact that I'm not the only person hitting these issues I think
>> it would be a good idea to allow for some sort of feedback of the
>> FileSystems loading process, which filesystems it tries to load, which fail
>> and so on.
>> Also, the maven library situation is a bit fuzzy in the sense that there
>> are libraries like beam-sdks-java-io-hdfs on a point release (0.6.0) as
>> well as beam-sdks-java-io-hadoop-file-system on the latest version.
>>
>> I've been expanding my trial-and-error pattern to the endpoint and are
>> ready to give up on Beam-on-Spark.  I could try to get a Spark test
>> environment configured for s3:// but I don't think it's all that
>> representative of real-world scenarios.
>>
>> Thanks anyway in advance for any suggestions,
>>
>> Matt
>> ---
>> Matt Casters <m <[email protected]>[email protected]>
>> Senior Solution Architect, Kettle Project Founder
>>
>>
>>

Re: Spark progress feedback

Reply via email to