You should try adding your NN host and port in the URL. On Mon, Mar 27, 2017 at 11:03 AM, Saisai Shao <sai.sai.s...@gmail.com> wrote:
> It's quite obvious your hdfs URL is not complete, please looks at the > exception, your hdfs URI doesn't have host, port. Normally it should be OK > if HDFS is your default FS. > > I think the problem is you're running on HDI, in which default FS is wasb. > So here short name without host:port will lead to error. This looks like a > HDI specific issue, you'd better ask HDI. > > Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no > host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz > > at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(Dist > ributedFileSystem.java:154) > > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem. > java:2791) > > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) > > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem > .java:2825) > > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2807) > > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386) > > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) > > > > > On Fri, Mar 24, 2017 at 9:18 PM, Yong Zhang <java8...@hotmail.com> wrote: > >> Of course it is possible. >> >> >> You can always to set any configurations in your application using API, >> instead of pass in through the CLI. >> >> >> val sparkConf = new SparkConf().setAppName(properties.get("appName") >> ).set("master", properties.get("master")).set(xxx, properties.get("xxx")) >> >> Your error is your environment problem. >> >> Yong >> ------------------------------ >> *From:* , Roy <rp...@njit.edu> >> *Sent:* Friday, March 24, 2017 7:38 AM >> *To:* user >> *Subject:* spark-submit config via file >> >> Hi, >> >> I am trying to deploy spark job by using spark-submit which has bunch of >> parameters like >> >> spark-submit --class StreamingEventWriterDriver --master yarn >> --deploy-mode cluster --executor-memory 3072m --executor-cores 4 --files >> streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf >> "streaming.conf" >> >> I was looking a way to put all these flags in the file to pass to >> spark-submit to make my spark-submitcommand simple like this >> >> spark-submit --class StreamingEventWriterDriver --master yarn >> --deploy-mode cluster --properties-file properties.conf --files >> streaming.conf spark_streaming_2.11-assembly-1.0-SNAPSHOT.jar -conf >> "streaming.conf" >> >> properties.conf has following contents >> >> >> spark.executor.memory 3072m >> >> spark.executor.cores 4 >> >> >> But I am getting following error >> >> >> 17/03/24 11:36:26 INFO Client: Use hdfs cache file as spark.yarn.archive >> for HDP, hdfsCacheFile:hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2- >> hdp-yarn-archive.tar.gz >> >> 17/03/24 11:36:26 WARN AzureFileSystemThreadPoolExecutor: Disabling >> threads for Delete operation as thread count 0 is <= 1 >> >> 17/03/24 11:36:26 INFO AzureFileSystemThreadPoolExecutor: Time taken for >> Delete operation is: 1 ms with threads: 0 >> >> 17/03/24 11:36:27 INFO Client: Deleted staging directory wasb:// >> a...@abc.blob.core.windows.net/user/sshuser/.sparkStag >> ing/application_1488402758319_0492 >> >> Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no >> host: hdfs:///hdp/apps/2.6.0.0-403/spark2/spark2-hdp-yarn-archive.tar.gz >> >> at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(Dist >> ributedFileSystem.java:154) >> >> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem. >> java:2791) >> >> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) >> >> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem >> .java:2825) >> >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java: >> 2807) >> >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386) >> >> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) >> >> at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client. >> scala:364) >> >> at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$ >> yarn$Client$$distribute$1(Client.scala:480) >> >> at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Cl >> ient.scala:552) >> >> at org.apache.spark.deploy.yarn.Client.createContainerLaunchCon >> text(Client.scala:881) >> >> at org.apache.spark.deploy.yarn.Client.submitApplication(Client >> .scala:170) >> >> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1218) >> >> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1277) >> >> at org.apache.spark.deploy.yarn.Client.main(Client.scala) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:62) >> >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:498) >> >> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy >> $SparkSubmit$$runMain(SparkSubmit.scala:745) >> >> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit >> .scala:187) >> >> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit. >> scala:212) >> >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala: >> 126) >> >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> >> 17/03/24 11:36:27 INFO MetricsSystemImpl: Stopping azure-file-system >> metrics system... >> >> Anyone know is this is even possible ? >> >> >> Thanks... >> >> Roy >> > > -- * Regards* * Sandeep Nemuri*