Hello, So I was unable to run the following commands from the spark shell with CDH 5.0 and spark 0.9.0, see below.
Once I removed the property <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> <final>true</final> </property> from the core-site.xml on the node, the spark commands worked. Is there a specific setup I am missing? scala> var log = sc.textFile("hdfs://jobs-ab-hnn1//input/core-site.xml") 14/04/30 22:43:16 INFO MemoryStore: ensureFreeSpace(78800) called with curMem=150115, maxMem=308713881 14/04/30 22:43:16 INFO MemoryStore: Block broadcast_1 stored as values to memory (estimated size 77.0 KB, free 294.2 MB) 14/04/30 22:43:16 WARN Configuration: mapred-default.xml:an attempt to override final parameter: mapreduce.tasktracker.cache.local.size; Ignoring. 14/04/30 22:43:16 WARN Configuration: yarn-site.xml:an attempt to override final parameter: mapreduce.output.fileoutputformat.compress.type; Ignoring. 14/04/30 22:43:16 WARN Configuration: hdfs-site.xml:an attempt to override final parameter: mapreduce.map.output.compress.codec; Ignoring. log: org.apache.spark.rdd.RDD[String] = MappedRDD[3] at textFile at <console>:12 scala> log.count() 14/04/30 22:43:03 WARN JobConf: The variable mapred.child.ulimit is no longer used. 14/04/30 22:43:04 WARN Configuration: mapred-default.xml:an attempt to override final parameter: mapreduce.tasktracker.cache.local.size; Ignoring. 14/04/30 22:43:04 WARN Configuration: yarn-site.xml:an attempt to override final parameter: mapreduce.output.fileoutputformat.compress.type; Ignoring. 14/04/30 22:43:04 WARN Configuration: hdfs-site.xml:an attempt to override final parameter: mapreduce.map.output.compress.codec; Ignoring. java.lang.IllegalArgumentException: java.net.UnknownHostException: jobs-a-hnn1 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:576) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:521) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:205) at org.apache.spark.SparkContext.runJob(SparkContext.scala:902) at org.apache.spark.rdd.RDD.count(RDD.scala:720) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:15) at $iwC$$iwC$$iwC.<init>(<console>:20) at $iwC$$iwC.<init>(<console>:22) at $iwC.<init>(<console>:24) at <init>(<console>:26) at .<init>(<console>:30) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:795) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:840) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:752) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:600) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:607) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:610) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:935) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:883) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:883) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:883) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:981) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) Caused by: java.net.UnknownHostException: jobs-a-hnn1 ... 59 more