Hi 

I have installed hadoop on a local virtual machine using the steps from this
URL

https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-on-ubuntu-13-10

In the local machine I write a little Spark application in java to read a
file from the hadoop instance installed in the virtual machine. 

The code is below

    public static void main(String[] args) {

    JavaSparkContext sc = new JavaSparkContext(new
SparkConf().setAppName("Spark Count").setMaster("local"));
    
    JavaRDD<String> lines =
sc.textFile("hdfs://10.62.57.141:50070/tmp/lines.txt");
    JavaRDD<Integer> lengths = lines.flatMap(new FlatMapFunction<String,
Integer>() {
                @Override
                public Iterable<Integer> call(String t) throws Exception {
                        return Arrays.asList(t.length());
                }
        });
    List<Integer> collect = lengths.collect();
    int totalLength = lengths.reduce(new Function2<Integer, Integer,
Integer>() {
                @Override
                public Integer call(Integer v1, Integer v2) throws Exception {
                        return v1+v2;
                }
        });
    System.out.println(totalLength);
    
  }


The application throws this exception

    Exception in thread "main" java.io.IOException: Failed on local
exception: com.google.protobuf.InvalidProtocolBufferException: Protocol
message end-group tag did not match expected tag.; Host Details : local host
is: "TOSHIBA-PC/192.168.56.1"; destination host is: "10.62.57.141":50070; 
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
        at org.apache.hadoop.ipc.Client.call(Client.java:1351)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
        at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
        at 
org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1701)
        at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1647)
        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:222)
        at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:220)
        at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:220)
        at 
org.apache.spark.rdd.FlatMappedRDD.getPartitions(FlatMappedRDD.scala:30)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:222)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:220)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:220)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1367)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:797)
        at
org.apache.spark.api.java.JavaRDDLike$class.collect(JavaRDDLike.scala:309)
        at org.apache.spark.api.java.JavaRDD.collect(JavaRDD.scala:32)
        at org.css.RaiSpark.RaiSparkApp.main(RaiSparkApp.java:25)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
message end-group tag did not match expected tag.
        at
com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
        at
com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
        at
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:202)
        at
com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
        at
com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
        at
com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
        at
com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
        at
org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:2364)
        at
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:996)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)



What this exception mean? and how can I fix it?

 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/InvalidProtocolBufferException-Protocol-message-end-group-tag-did-not-match-expected-tag-tp21777.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to