1. so the only way to use the local file again, without using Hadoop, would be to clear .../hadoop/bin from $PATH ? 2. I can confirm that HDFS port is 8020 3. I recompiled Spark with "SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly"
HDFS seems to be running ok, and I made the following test: on master: copied a file from local storage to hdfs on slave: copied the file from hdfs to a local path. As a result, I got the file on both machines Thanks On Tuesday, January 28, 2014 7:32 PM, 尹绪森 <[email protected]> wrote: 1. textFile() function uses HadoopRDD in the behind, so the regular path will have same problem. 2. Could you confirm the HDFS port is 8020 ? 3. Does your Spark compiled with the right version of your Hadoop ? 2014-1-28 PM10:37于 "Kal El" <[email protected]>写道: I see that if I replace the hdfs path with a regular (local) one I get the same error ... > > > >On Tuesday, January 28, 2014 3:37 PM, Kal El <[email protected]> wrote: > >I am trying to read a file from hdfs. (The hdfs is working fine, I have >uploaded the file from one machine - the master, and downloaded it on a slave >to make sure that everything is ok) but I am receiving this error: > > >(this is how my reading line looks like: "val lines = >sc.textFile("hdfs://10.237.114.143:8020/files/fisier_16mil_30D_R10k.txt") >") > > >java.io.IOException: Failed on local exception: java.io.EOFException; Host >Details : local host is: "SparkTwo/127.0.0.1"; destination host is: >"10.237.114.143":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at >org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at >org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at >org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) > at >org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679) > at >org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106) > at >org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102) > at >org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at >org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102) > at >org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1701) > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1647) > at >org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:222) > at >org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:141) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) > at scala.Option.getOrElse(Option.scala:108) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) > at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) > at scala.Option.getOrElse(Option.scala:108) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) > at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:26) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:201) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:199) > at scala.Option.getOrElse(Option.scala:108) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:199) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:886) > at org.apache.spark.rdd.RDD.count(RDD.scala:698) > at org.apache.spark.rdd.RDD.takeSample(RDD.scala:323) > at SparkKMeans$.main(SparkKMeans.scala:66) > at SparkKMeans.main(SparkKMeans.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at >scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78) > at >scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24) > at >scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88) > at >scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78) > at >scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101) > at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33) > at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40) > at >scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56) > at >scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80) > at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89) > at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala) > > >Any thoughts on this ? > > >Thanks > > > >
