The XXX Class named Block, below is part codes of it:

The deserialize code like this:

public static Block deserializeFrom(byte[] bytes) {
    try {
        Block b = SerializationUtils.deserialize(bytes);
        System.out.println("b="+b);
        return b;
    } catch (ClassCastException e) {
        System.out.println("ClassCastException");
        e.printStackTrace();
    } catch (IllegalArgumentException e) {
        System.out.println("IllegalArgumentException");
        e.printStackTrace();

    } catch (SerializationException e) {
        System.out.println("SerializationException");
        e.printStackTrace();
    }
    return null;
}


The Spark code is:

val fis = spark.sparkContext.binaryFiles("/folder/abc*.file")
val RDD = fis.map(x => {
  val content = x._2.toArray()
  val b = Block.deserializeFrom(content)
  ...
}


All codes above can run successfully in Spark local mode, but when run it in 
Yarn cluster mode, the error happens.

在 2019/6/27 上午9:49, Tomo Suzuki 写道:

I'm afraid that I don't have enough information to troubleshoot problem in
com.XXX.XXX. It would be great if you can create a minimal example project
that can reproduce the same issue.

Regards,
Tomo

On Wed, Jun 26, 2019 at 9:20 PM big data 
<[email protected]><mailto:[email protected]> wrote:



Hi,

Actually, the class com.XXX.XXX is normally called in the before spark
code, and this exception error is happened in one static method of this
class.

So the jar dependency problem can be excluded.

在 2019/6/26 下午10:23, Tomo Suzuki 写道:


Hi Big data,

I don't use SerializationUtils, but if I interpret the error message:

   ClassNotFoundException: com.XXXX.XXXX

, this says com.XXXX.XXXX is not available in the class path of JVM


(which


your Spark is running on). I would verify that you can instantiate
com.XXXX.XXXX in Spark/Scala *without* SerializationUtils.

Regards,
Tomo



On Wed, Jun 26, 2019 at 4:12 AM big data 
<[email protected]><mailto:[email protected]>


wrote:





I use Apache Commons Lang3's SerializationUtils in the code.

SerializationUtils.serialize()

to store a customized class as files into disk and

SerializationUtils.deserialize(byte[])

to restore them again.

In the local environment (Mac OS), all serialized files can be
deserialized normally and no error happens. But when I copy these
serialized files into HDFS, and read them from HDFS by using


Spark/Scala, a


SerializeException happens.

The Apache Commons Lang3 version is:

     <dependency>
         <groupId>org.apache.commons</groupId>
         <artifactId>commons-lang3</artifactId>
         <version>3.9</version>
     </dependency>


the stack error as below:

org.apache.commons.lang3.SerializationException:
java.lang.ClassNotFoundException: com.XXXX.XXXX
     at



org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:227)


     at



org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:265)


     at com.com.XXXX.XXXX.deserializeFrom(XXX.java:81)
     at com.XXX.FFFF$$anonfun$3.apply(BXXXX.scala:157)
     at com.XXX.FFFF$$anonfun$3.apply(BXXXX.scala:153)
     at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
     at scala.collection.Iterator$class.foreach(Iterator.scala:893)
     at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
     at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
     at



scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)


     at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
     at scala.collection.TraversableOnce$class.to
(TraversableOnce.scala:310)
     at scala.collection.AbstractIterator.to(Iterator.scala:1336)
     at



scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)


     at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
     at



scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)


     at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
     at



org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)


     at



org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:945)


     at



org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)


     at



org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)


     at


org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)


     at org.apache.spark.scheduler.Task.run(Task.scala:109)
     at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
     at



java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)


     at



java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)


     at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.XXXX.XXXX
     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
     at java.lang.Class.forName0(Native Method)
     at java.lang.Class.forName(Class.java:348)
     at


java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:686)


     at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1868)
     at


java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)


     at



java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)


     at


java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)


     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
     at



org.apache.commons.lang3.SerializationUtils.deserialize(SerializationUtils.java:223)



I've check the loaded byte[]'s length, both from local and from HDFS are
same. But why it can not be deserialized from HDFS?












Reply via email to