Claire, There shouldn't be a need to run the pipeline like this since the Apex runner already has the support to launch hadoop with the required dependencies.
Can you please confirm that you are able to run the basic word count example as shown here: https://beam.apache.org/documentation/runners/apex/ Thanks, Thomas On Tue, Jun 6, 2017 at 5:07 PM, Claire Yuan <clairey...@yahoo-inc.com> wrote: > Hi all, > I am the one trying to run Apache Beam example on cluster: > I used the following command with my given input in a folder named > "harrypotter": > *#!/bin/bash* > > *HADOOP_CLASSPATH="$HADOOP_CLASSPATH:/tmp/beam/jars/*" hadoop jar > /tmp/beam/jars/beam-examples-java-2.1.0-SNAPSHOT.jar > org.apache.beam.examples.complete.TfIdf --runner=ApexRunner > --embeddedExecution=false --output=apexrunnertfidf > --input=/tmp/beam/harrypotter/* > > *java -cp /homes/org.apache.beam.examples.complete.TfIdf* > -------------------------------------------------------------------------- > > However, some configuration seems to go wrong: > > *Exception in thread "main" java.lang.RuntimeException: Failed to launch > the application on YARN.* > * at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:204)* > * at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:82)* > * at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)* > * at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)* > * at org.apache.beam.examples.complete.TfIdf.main(TfIdf.java:442)* > * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)* > * at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)* > * at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)* > * at java.lang.reflect.Method.invoke(Method.java:498)* > * at org.apache.hadoop.util.RunJar.run(RunJar.java:234)* > * at org.apache.hadoop.util.RunJar.main(RunJar.java:148)* > *Caused by: java.io.FileNotFoundException:hadoop/client/dfs.include (No > such file or directory)* > * at java.io.FileInputStream.open0(Native Method)* > * at java.io.FileInputStream.open(FileInputStream.java:195)* > * at java.io.FileInputStream.<init>(FileInputStream.java:138)* > * at org.apache.commons.io > <http://org.apache.commons.io>.FileUtils.copyFile(FileUtils.java:1112)* > * at > org.apache.beam.runners.apex.ApexYarnLauncher$2.visitFile(ApexYarnLauncher.java:277)* > * at > org.apache.beam.runners.apex.ApexYarnLauncher$2.visitFile(ApexYarnLauncher.java:253)* > * at java.nio.file.Files.walkFileTree(Files.java:2670)* > * at java.nio.file.Files.walkFileTree(Files.java:2742)* > * at > org.apache.beam.runners.apex.ApexYarnLauncher.createJar(ApexYarnLauncher.java:253)* > * at > org.apache.beam.runners.apex.ApexYarnLauncher.launchApp(ApexYarnLauncher.java:90)* > * at org.apache.beam.runners.apex.ApexRunner.run(ApexRunner.java:201)* > > I checked the :hadoop/client/ folder and found that the dfs.include > actually exists. > May any of you give solution to this? > > Claire >