No luck I am afraid. After giving the namenode 16GB of RAM, I am still getting an out of mem exception, kind of different one:

15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw exception: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1351) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
    at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy10.getListing(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)
    at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:724) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
    at org.apache.hadoop.fs.Globber.listStatus(Globber.java:69)
    at org.apache.hadoop.fs.Globber.glob(Globber.java:217)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1644)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:292) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264) at org.apache.spark.input.StreamFileInputFormat.setMinPartitions(PortableDataStream.scala:47) at org.apache.spark.rdd.BinaryFileRDD.getPartitions(BinaryFileRDD.scala:43)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)


and on the 2nd retry of spark, a similar exception:

java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.google.protobuf.LiteralByteString.toString(LiteralByteString.java:148)
    at com.google.protobuf.ByteString.toStringUtf8(ByteString.java:572)
at org.apache.hadoop.hdfs.protocol.proto.HdfsProtos$HdfsFileStatusProto.getOwner(HdfsProtos.java:21558) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
    at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy10.getListing(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)
    at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:724) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755) at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
    at org.apache.hadoop.fs.Globber.listStatus(Globber.java:69)
    at org.apache.hadoop.fs.Globber.glob(Globber.java:217)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1644)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:292) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264) at org.apache.spark.input.StreamFileInputFormat.setMinPartitions(PortableDataStream.scala:47) at org.apache.spark.rdd.BinaryFileRDD.getPartitions(BinaryFileRDD.scala:43)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)


Any ideas which part of hadoop is running out of mem?


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to