Hi David,

I assume you are running with latest Crail master. We just pushed a change to the CrailConfiguration initialization which we have not adapted yet in the shuffle plugin (Should be a one line fix). @Adrian Can you take a look.

Regards,
Jonas

 On Tue, 18 Jun 2019 23:24:48 +0000
 David Crespi <david.cre...@storedgesystems.com> wrote:
Hi,
I’m getting what looks to be a configuration error when trying to use the CrailShuffleManager. (spark.shuffle.manager org.apache.spark.shuffle.crail.CrailShuffleManager)

It seems like a basic error, but other things are running okay until I add in the line above in to my spark-defaults.conf
File.
I have my environment variable for crail home set, as well as for the disni libs using:
LD_LIBRARY_PATH=/usr/local/lib
$ ls -l /usr/local/lib/
total 156
-rwxr-xr-x 1 root root       947 Jun 18 08:11 libdisni.la
lrwxrwxrwx 1 root root 17 Jun 18 08:11 libdisni.so -> libdisni.so.0.0.0 lrwxrwxrwx 1 root root 17 Jun 18 08:11 libdisni.so.0 -> libdisni.so.0.0.0
-rwxr-xr-x 1 root root  149784 Jun 18 08:11 libdisni.so.0.0.0

I also have a environment variable for classpath set:
CLASSPATH=/disni/target/*:/jNVMf/target/*:/crail/jars/*

Could the classpath veriable be the issue?

19/06/18 15:59:47 DEBUG Client: getting client out of cache: org.apache.hadoop.ipc.Client@7bebcd65 19/06/18 15:59:47 DEBUG PerformanceAdvisory: Both short-circuit local reads and UNIX domain socket are disabled. 19/06/18 15:59:47 DEBUG DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection 19/06/18 15:59:48 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 288.9 KB, free 366.0 MB) 19/06/18 15:59:48 DEBUG BlockManager: Put block broadcast_0 locally took 123 ms 19/06/18 15:59:48 DEBUG BlockManager: Putting block broadcast_0 without replication took 125 ms 19/06/18 15:59:48 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.8 KB, free 366.0 MB) 19/06/18 15:59:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on master:34103 (size: 23.8 KB, free: 366.3 MB) 19/06/18 15:59:48 DEBUG BlockManagerMaster: Updated info of block broadcast_0_piece0 19/06/18 15:59:48 DEBUG BlockManager: Told master about block broadcast_0_piece0 19/06/18 15:59:48 DEBUG BlockManager: Put block broadcast_0_piece0 locally took 7 ms 19/06/18 15:59:48 DEBUG BlockManager: Putting block broadcast_0_piece0 without replication took 8 ms 19/06/18 15:59:48 INFO SparkContext: Created broadcast 0 from newAPIHadoopFile at TeraSort.scala:60
19/06/18 15:59:48 DEBUG Client: The ping interval is 60000 ms.
19/06/18 15:59:48 DEBUG Client: Connecting to NameNode-1/192.168.3.7:54310 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser: starting, having connections 1 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser sending #0 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser got value #0 19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: getFileInfo took 56ms 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser sending #1 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser got value #1
19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: getListing took 3ms
19/06/18 15:59:48 DEBUG FileInputFormat: Time taken to get FileStatuses: 142 19/06/18 15:59:48 INFO FileInputFormat: Total input paths to process : 2 19/06/18 15:59:48 DEBUG FileInputFormat: Total # of splits generated by getSplits: 2, TimeTaken: 145 19/06/18 15:59:48 DEBUG FileCommitProtocol: Creating committer org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1; output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false 19/06/18 15:59:48 DEBUG FileCommitProtocol: Using (String, String, Boolean) constructor 19/06/18 15:59:48 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 19/06/18 15:59:48 DEBUG DFSClient: /tmp/data_sort/_temporary/0: masked=rwxr-xr-x 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser sending #2 19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to NameNode-1/192.168.3.7:54310 from hduser got value #2
19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
19/06/18 15:59:48 DEBUG ClosureCleaner: Cleaning lambda: $anonfun$write$1 19/06/18 15:59:48 DEBUG ClosureCleaner: +++ Lambda closure ($anonfun$write$1) is now cleaned +++ 19/06/18 15:59:48 INFO SparkContext: Starting job: runJob at SparkHadoopWriter.scala:78 19/06/18 15:59:48 INFO CrailDispatcher: CrailStore starting version 400 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.deleteonclose false 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.deleteOnStart true
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.preallocate 0
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.writeAhead 0
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.debug false
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.serializer org.apache.spark.serializer.CrailSparkSerializer 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.shuffle.affinity true 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.shuffle.outstanding 1 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.shuffle.storageclass 0 19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.broadcast.storageclass 0 Exception in thread "dag-scheduler-event-loop" java.lang.IllegalAccessError: tried to access method org.apache.crail.conf.CrailConfiguration.<init>()V from class org.apache.spark.storage.CrailDispatcher at org.apache.spark.storage.CrailDispatcher.org$apache$spark$storage$CrailDispatcher$$init(CrailDispatcher.scala:119) at org.apache.spark.storage.CrailDispatcher$.get(CrailDispatcher.scala:662) at org.apache.spark.shuffle.crail.CrailShuffleManager.registerShuffle(CrailShuffleManager.scala:52) at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:94) at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87) at org.apache.spark.rdd.RDD.$anonfun$dependencies$2(RDD.scala:240)
       at scala.Option.getOrElse(Option.scala:138)
       at org.apache.spark.rdd.RDD.dependencies(RDD.scala:238)
at org.apache.spark.scheduler.DAGScheduler.getShuffleDependencies(DAGScheduler.scala:512) at org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:461) at org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:448) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:962) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2067) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

Regards,

          David




Reply via email to