Hi,
I don't understand why counting file is failing, but when I running in
spark-shell second time is give good results (stanley is a nameservice not
real host, hdfs-site.xml with config is in the classpath)
Below full log from spark shell:
14/02/15 03:04:22 INFO :
initialize(tachyon://hadoop-ha-1:19998/tmp/proxy.txt, Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
hdfs-default.xml, hdfs-site.xml). Connecting to Tachyon:
tachyon://hadoop-ha-1:19998/tmp/proxy.txt
14/02/15 03:04:22 INFO : Trying to connect master @
hadoop-ha-1/14.255.247.81:19998
14/02/15 03:04:22 INFO : User registered at the master
hadoop-ha-1/14.255.247.81:19998 got UserId 15
14/02/15 03:04:22 INFO : Trying to get local worker host : hadoop-ha-1
14/02/15 03:04:22 INFO : No local worker on hadoop-ha-1
14/02/15 03:04:22 INFO : Connecting remote worker @
hadoop-worker-6/14.255.247.53:29998
14/02/15 03:04:22 INFO : tachyon://hadoop-ha-1:19998
tachyon://hadoop-ha-1:19998 hdfs://stanley
14/02/15 03:04:22 INFO : getFileStatus(/tmp/proxy.txt): HDFS Path:
hdfs://stanley/tmp/proxy.txt TPath:
tachyon://hadoop-ha-1:19998/tmp/proxy.txt
14/02/15 03:04:22 INFO mapred.FileInputFormat: Total input paths to process
: 1
...
14/02/15 03:04:23 WARN scheduler.TaskSetManager: Loss was due to
java.lang.IllegalArgumentException
java.lang.IllegalArgumentException: java.net.UnknownHostException: stanley
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:448)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:410)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:163)
at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:587)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:315)
at
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:288)
at
org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:391)
at
org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:391)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:111)
at
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:111)
at scala.Option.map(Option.scala:145)
...
org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
(most recent failure: Exception failure: java.lang.IllegalArgumentException:
java.net.UnknownHostException: stanley)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
scala> val s = sc.textFile("tachyon://hadoop-ha-1:19998/tmp/proxy.txt")
14/02/15 03:04:27 INFO storage.MemoryStore: ensureFreeSpace(45012) called
with curMem=80405, maxMem=309225062
14/02/15 03:04:27 INFO storage.MemoryStore: Block broadcast_1 stored as
values to memory (estimated size 44.0 KB, free 294.8 MB)
s: org.apache.spark.rdd.RDD[String] = MappedRDD[3] at textFile at
<console>:12
scala> s.count()
14/02/15 03:04:29 INFO : getFileStatus(/tmp/proxy.txt): HDFS Path:
hdfs://stanley/tmp/proxy.txt TPath:
tachyon://hadoop-ha-1:19998/tmp/proxy.txt
14/02/15 03:04:29 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/02/15 03:04:29 INFO spark.SparkContext: Starting job: count at
<console>:15
14/02/15 03:04:29 INFO scheduler.DAGScheduler: Got job 1 (count at
<console>:15) with 2 output partitions (allowLocal=false)
14/02/15 03:04:29 INFO spark.SparkContext: Job finished: count at
<console>:15, took 0.466730364 s
res1: Long = 5
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Inconsistent-behavior-when-running-spark-on-top-of-tachyon-on-top-of-HDFS-HA-tp1544.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.