Re: trying to run Apex on Hadoop cluster

Thomas Weise Tue, 06 Jun 2017 17:19:07 -0700

Claire,

There shouldn't be a need to run the pipeline like this since the Apex
runner already has the support to launch hadoop with the required
dependencies.


Can you please confirm that you are able to run the basic word count
example as shown here:

https://beam.apache.org/documentation/runners/apex/

Thanks,
Thomas





On Tue, Jun 6, 2017 at 5:07 PM, Claire Yuan <clairey...@yahoo-inc.com>
wrote:

> Hi all,
>   I am the one trying to run Apache Beam example on cluster:
>   I used the following command with my given input in a folder named
> "harrypotter":
> *#!/bin/bash*
>
> *HADOOP_CLASSPATH="$HADOOP_CLASSPATH:/tmp/beam/jars/*" hadoop jar
> /tmp/beam/jars/beam-examples-java-2.1.0-SNAPSHOT.jar
> org.apache.beam.examples.complete.TfIdf --runner=ApexRunner
> --embeddedExecution=false --output=apexrunnertfidf
> --input=/tmp/beam/harrypotter/*
>
> *java -cp /homes/org.apache.beam.examples.complete.TfIdf*
> --------------------------------------------------------------------------
>
> However, some configuration seems to go wrong:
>
> *Exception in thread "main" java.lang.RuntimeException: Failed to launch
> the application on YARN.*
> * at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:204)*
> * at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:82)*
> * at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)*
> * at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)*
> * at org.apache.beam.examples.complete.TfIdf.main(TfIdf.java:442)*
> * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
> * at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)*
> * at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
> * at java.lang.reflect.Method.invoke(Method.java:498)*
> * at org.apache.hadoop.util.RunJar.run(RunJar.java:234)*
> * at org.apache.hadoop.util.RunJar.main(RunJar.java:148)*
> *Caused by: java.io.FileNotFoundException:hadoop/client/dfs.include (No
> such file or directory)*
> * at java.io.FileInputStream.open0(Native Method)*
> * at java.io.FileInputStream.open(FileInputStream.java:195)*
> * at java.io.FileInputStream.<init>(FileInputStream.java:138)*
> * at org.apache.commons.io
> <http://org.apache.commons.io>.FileUtils.copyFile(FileUtils.java:1112)*
> * at
> org.apache.beam.runners.apex.ApexYarnLauncher$2.visitFile(ApexYarnLauncher.java:277)*
> * at
> org.apache.beam.runners.apex.ApexYarnLauncher$2.visitFile(ApexYarnLauncher.java:253)*
> * at java.nio.file.Files.walkFileTree(Files.java:2670)*
> * at java.nio.file.Files.walkFileTree(Files.java:2742)*
> * at
> org.apache.beam.runners.apex.ApexYarnLauncher.createJar(ApexYarnLauncher.java:253)*
> * at
> org.apache.beam.runners.apex.ApexYarnLauncher.launchApp(ApexYarnLauncher.java:90)*
> * at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:201)*
>
> I checked the :hadoop/client/ folder and found that the dfs.include
> actually exists.
> May any of you give solution to this?
>
> Claire
>

Re: trying to run Apex on Hadoop cluster

Reply via email to