Hi David,
I assume you are running with latest Crail master. We just pushed a change
to the CrailConfiguration initialization which we have not adapted yet in
the shuffle plugin (Should be a one line fix). @Adrian Can you take a look.
Regards,
Jonas
On Tue, 18 Jun 2019 23:24:48 +0000
David Crespi <david.cre...@storedgesystems.com> wrote:
Hi,
I’m getting what looks to be a configuration error when trying to
use the CrailShuffleManager.
(spark.shuffle.manager
org.apache.spark.shuffle.crail.CrailShuffleManager)
It seems like a basic error, but other things are running okay until
I add in the line above in to my spark-defaults.conf
File.
I have my environment variable for crail home set, as well as for
the disni libs using:
LD_LIBRARY_PATH=/usr/local/lib
$ ls -l /usr/local/lib/
total 156
-rwxr-xr-x 1 root root 947 Jun 18 08:11 libdisni.la
lrwxrwxrwx 1 root root 17 Jun 18 08:11 libdisni.so ->
libdisni.so.0.0.0
lrwxrwxrwx 1 root root 17 Jun 18 08:11 libdisni.so.0 ->
libdisni.so.0.0.0
-rwxr-xr-x 1 root root 149784 Jun 18 08:11 libdisni.so.0.0.0
I also have a environment variable for classpath set:
CLASSPATH=/disni/target/*:/jNVMf/target/*:/crail/jars/*
Could the classpath veriable be the issue?
19/06/18 15:59:47 DEBUG Client: getting client out of cache:
org.apache.hadoop.ipc.Client@7bebcd65
19/06/18 15:59:47 DEBUG PerformanceAdvisory: Both short-circuit
local reads and UNIX domain socket are disabled.
19/06/18 15:59:47 DEBUG DataTransferSaslUtil: DataTransferProtocol
not using SaslPropertiesResolver, no QOP found in configuration for
dfs.data.transfer.protection
19/06/18 15:59:48 INFO MemoryStore: Block broadcast_0 stored as
values in memory (estimated size 288.9 KB, free 366.0 MB)
19/06/18 15:59:48 DEBUG BlockManager: Put block broadcast_0 locally
took 123 ms
19/06/18 15:59:48 DEBUG BlockManager: Putting block broadcast_0
without replication took 125 ms
19/06/18 15:59:48 INFO MemoryStore: Block broadcast_0_piece0 stored
as bytes in memory (estimated size 23.8 KB, free 366.0 MB)
19/06/18 15:59:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in
memory on master:34103 (size: 23.8 KB, free: 366.3 MB)
19/06/18 15:59:48 DEBUG BlockManagerMaster: Updated info of block
broadcast_0_piece0
19/06/18 15:59:48 DEBUG BlockManager: Told master about block
broadcast_0_piece0
19/06/18 15:59:48 DEBUG BlockManager: Put block broadcast_0_piece0
locally took 7 ms
19/06/18 15:59:48 DEBUG BlockManager: Putting block
broadcast_0_piece0 without replication took 8 ms
19/06/18 15:59:48 INFO SparkContext: Created broadcast 0 from
newAPIHadoopFile at TeraSort.scala:60
19/06/18 15:59:48 DEBUG Client: The ping interval is 60000 ms.
19/06/18 15:59:48 DEBUG Client: Connecting to
NameNode-1/192.168.3.7:54310
19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
NameNode-1/192.168.3.7:54310 from hduser: starting, having
connections 1
19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
NameNode-1/192.168.3.7:54310 from hduser sending #0
19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
NameNode-1/192.168.3.7:54310 from hduser got value #0
19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: getFileInfo took
56ms
19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
NameNode-1/192.168.3.7:54310 from hduser sending #1
19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
NameNode-1/192.168.3.7:54310 from hduser got value #1
19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: getListing took 3ms
19/06/18 15:59:48 DEBUG FileInputFormat: Time taken to get
FileStatuses: 142
19/06/18 15:59:48 INFO FileInputFormat: Total input paths to process
: 2
19/06/18 15:59:48 DEBUG FileInputFormat: Total # of splits generated
by getSplits: 2, TimeTaken: 145
19/06/18 15:59:48 DEBUG FileCommitProtocol: Creating committer
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol; job 1;
output=hdfs://NameNode-1:54310/tmp/data_sort; dynamic=false
19/06/18 15:59:48 DEBUG FileCommitProtocol: Using (String, String,
Boolean) constructor
19/06/18 15:59:48 INFO FileOutputCommitter: File Output Committer
Algorithm version is 1
19/06/18 15:59:48 DEBUG DFSClient: /tmp/data_sort/_temporary/0:
masked=rwxr-xr-x
19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
NameNode-1/192.168.3.7:54310 from hduser sending #2
19/06/18 15:59:48 DEBUG Client: IPC Client (199041063) connection to
NameNode-1/192.168.3.7:54310 from hduser got value #2
19/06/18 15:59:48 DEBUG ProtobufRpcEngine: Call: mkdirs took 3ms
19/06/18 15:59:48 DEBUG ClosureCleaner: Cleaning lambda:
$anonfun$write$1
19/06/18 15:59:48 DEBUG ClosureCleaner: +++ Lambda closure
($anonfun$write$1) is now cleaned +++
19/06/18 15:59:48 INFO SparkContext: Starting job: runJob at
SparkHadoopWriter.scala:78
19/06/18 15:59:48 INFO CrailDispatcher: CrailStore starting version
400
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.deleteonclose
false
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.deleteOnStart
true
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.preallocate 0
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.writeAhead 0
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.debug false
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.serializer
org.apache.spark.serializer.CrailSparkSerializer
19/06/18 15:59:48 INFO CrailDispatcher: spark.crail.shuffle.affinity
true
19/06/18 15:59:48 INFO CrailDispatcher:
spark.crail.shuffle.outstanding 1
19/06/18 15:59:48 INFO CrailDispatcher:
spark.crail.shuffle.storageclass 0
19/06/18 15:59:48 INFO CrailDispatcher:
spark.crail.broadcast.storageclass 0
Exception in thread "dag-scheduler-event-loop"
java.lang.IllegalAccessError: tried to access method
org.apache.crail.conf.CrailConfiguration.<init>()V from class
org.apache.spark.storage.CrailDispatcher
at
org.apache.spark.storage.CrailDispatcher.org$apache$spark$storage$CrailDispatcher$$init(CrailDispatcher.scala:119)
at
org.apache.spark.storage.CrailDispatcher$.get(CrailDispatcher.scala:662)
at
org.apache.spark.shuffle.crail.CrailShuffleManager.registerShuffle(CrailShuffleManager.scala:52)
at
org.apache.spark.ShuffleDependency.<init>(Dependency.scala:94)
at
org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:87)
at
org.apache.spark.rdd.RDD.$anonfun$dependencies$2(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.rdd.RDD.dependencies(RDD.scala:238)
at
org.apache.spark.scheduler.DAGScheduler.getShuffleDependencies(DAGScheduler.scala:512)
at
org.apache.spark.scheduler.DAGScheduler.getOrCreateParentStages(DAGScheduler.scala:461)
at
org.apache.spark.scheduler.DAGScheduler.createResultStage(DAGScheduler.scala:448)
at
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:962)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2067)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
at
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Regards,
David