just try dropping in that Jar. Hadoop core ships with an out of date guava JAR 
to avoid breaking old code downstream, but 2.7.x is designed to work with later 
versions too (i.e. it has moved off any of the now-removed methods. See 
https://issues.apache.org/jira/browse/HADOOP-10101 for the specifics

On 23 Oct 2015, at 12:10, jinhong lu 
<lujinho...@gmail.com<mailto:lujinho...@gmail.com>> wrote:

Hi, I run spark to write data to hbase, but found NoSuchMethodException:

15/10/23 18:45:21 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 
dn18-formal.i.nease.net<http://dn18-formal.i.nease.net/>): 
java.lang.NoSuchMethodError: 
com.google.common.io.ByteStreams.limit(Ljava/io/InputStream;J)Ljava/io/InputStream;

I found guava.jar in hadoop/hbase dir and the version is 12.0, but 
com.google.common.io.ByteStreams.limit is since 14.0, so NoSuchMethodException 
occurs.

I try to run spark-submmit by - -jars,but the same. and I try to add
      configuration.set("spark.executor.extraClassPath", "/home/ljh")
      configuration.set("spark.driver.userClassPathFirst","true");
to my code, still the same.

How to solve this? How to remove the guava.jar in hadoop/hbase from class path? 
why it does not use the guava.jar in spark dir.

Here is my code:

rdd.foreach({ res =>
      val configuration = HBaseConfiguration.create();

      configuration.set("hbase.zookeeper.property.clientPort", "2181");
      configuration.set("hbase.zookeeper.quorum", “ip.66");
      configuration.set("hbase.master", “ip:60000");
      configuration.set("spark.executor.extraClassPath", "/home/ljh")
      configuration.set("spark.driver.userClassPathFirst","true");
      val hadmin = new HBaseAdmin(configuration);
      configuration.clear();
      configuration.addResource("/home/hadoop/conf/core-default.xml")
      configuration.addResource("/home/hadoop/conf/core-site.xml")
      configuration.addResource("/home/hadoop/conf/mapred-default.xml")
      configuration.addResource("/home/hadoop/conf/mapred-site.xml")
      configuration.addResource("/home/hadoop/conf/yarn-default.xml")
      configuration.addResource("/home/hadoop/conf/yarn-site.xml")
      configuration.addResource("/home/hadoop/conf/hdfs-default.xml")
      configuration.addResource("/home/hadoop/conf/hdfs-site.xml")
      configuration.addResource("/home/hadoop/conf/hbase-default.xml")
      configuration.addResource("/home/ljhn1829/hbase-site.xml")
      val table = new HTable(configuration, "ljh_test2");
      var put = new Put(Bytes.toBytes(res.toKey()));
      put.add(Bytes.toBytes("basic"), Bytes.toBytes("name"), 
Bytes.toBytes(res.totalCount + "\t" + res.positiveCount));
      table.put(put);
      table.flushCommits()
    })

and the error message:


15/10/23 19:06:42 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 
gdc-dn126-formal.i.nease.net<http://gdc-dn126-formal.i.nease.net/>): 
java.lang.NoSuchMethodError: 
com.google.common.io.ByteStreams.limit(Ljava/io/InputStream;J)Ljava/io/InputStream;
at 
org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.nextBatchStream(ExternalAppendOnlyMap.scala:420)
at 
org.apache.spark.util.collection.ExternalAppendOnlyMap$DiskMapIterator.<init>(ExternalAppendOnlyMap.scala:392)
at 
org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:207)
at 
org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:63)
at 
org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:83)
at 
org.apache.spark.util.collection.ExternalAppendOnlyMap.maybeSpill(ExternalAppendOnlyMap.scala:63)
at 
org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:60)
at 
org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:46)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

15/10/23 19:06:42 INFO TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, 
gdc-dn166-formal.i.nease.net<http://gdc-dn166-formal.i.nease.net/>, 
PROCESS_LOCAL, 1277 bytes)
15/10/23 19:06:42 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 
gdc-dn166-formal.i.nease.net<http://gdc-dn166-formal.i.nease.net/>:3838 (size: 
3.2 KB, free: 1060.3 MB)
15/10/23 19:06:42 ERROR YarnScheduler: Lost executor 1 on 
gdc-dn126-formal.i.nease.net<http://gdc-dn126-formal.i.nease.net/>: remote Rpc 
client disassociated
15/10/23 19:06:42 WARN ReliableDeliverySupervisor: Association with remote 
system [akka.tcp://sparkexecu...@gdc-dn126-formal.i.nease.net:1656] has failed, 
address is now gated for [5000] ms. Reason is: [Disassociated].
15/10/23 19:06:42 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 1.0
15/10/23 19:06:42 INFO DAGScheduler: Executor lost: 1 (epoch 1)
15/10/23 19:06:42 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 
from BlockManagerMaster.
15/10/23 19:06:42 INFO BlockManagerMasterEndpoint: Removing block manager 
BlockManagerId(1, 
gdc-dn126-formal.i.nease.net<http://gdc-dn126-formal.i.nease.net/>, 44635)
15/10/23 19:06:42 INFO BlockManagerMaster: Removed 1 successfully in 
removeExecutor
15/10/23 19:06:42 INFO ShuffleMapStage: ShuffleMapStage 0 is now unavailable on 
executor 1 (0/1, false)
15/10/23 19:06:42 INFO MapOutputTrackerMasterEndpoint: Asked to send map output 
locations for shuffle 1 to 
gdc-dn166-formal.i.nease.net<http://gdc-dn166-formal.i.nease.net/>:28595
15/10/23 19:06:42 INFO MapOutputTrackerMaster: Size of output statuses for 
shuffle 1 is 84 bytes
15/10/23 19:06:42 WARN TaskSetManager: Lost task 0.1 in stage 1.0 (TID 2, 
gdc-dn166-formal.i.nease.net<http://gdc-dn166-formal.i.nease.net/>): 
FetchFailed(null, shuffleId=1, mapId=-1, reduceId=0, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output 
location for shuffle 1
at 
org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:389)
at 
org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:386)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at 
org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:385)
at 
org.apache.spark.MapOutputTracker.getServerStatuses(MapOutputTracker.scala:172)
at 
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:42)
at 
org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:40)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:90)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


Reply via email to