Re: GPU Acceleration for spark-3.0.0
Bobby Thanks for your answer, it seems that I have misunderstood this paragraph in the website : *"GPU-accelerate your Apache Spark 3.0 data science pipelines—without code changes—and speed up data processing and model training while substantially lowering infrastructure costs."* . So if I am going to use GPU in my job running on the spark , I still need to code the map and reduce function in cuda or in c++ and then invoke them throught jni or something like GPUEnabler , is that right ? thanks Charles -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
GPU Acceleration for spark-3.0.0
hi, I have configured the GPU scheduling for spark-3.0.0 on yarn following the official document ,but the job seems not runing with GPU . Do I need to modify my code to invoke CUDA ? Is there any tutorial can be shared ? running logs: ... 2020-06-13 10:58:01,938 INFO spark.SparkContext: Running Spark version 3.0.0-preview2 2020-06-13 10:58:04,101 INFO resource.ResourceUtils: == 2020-06-13 10:58:04,105 INFO resource.ResourceUtils: Resources for spark.driver: gpu -> [name: gpu, addresses: 0] spark-default.conf: ... spark.executor.resource.gpu.amount 1 spark.worker.resource.gpu.amount1 spark.driver.resource.gpu.amount1 spark.driver.resource.gpu.discoveryScript /usr/local/spark-3.0.0/examples/src/main/scripts/getGpusResources.sh spark.worker.resource.gpu.discoveryScript /usr/local/spark-3.0.0/examples/src/main/scripts/getGpusResources.sh nodemanager log: ... 2020-06-13 10:55:07,702 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.ResourcePluginManager: Found Resource plugins from configuration: [yarn.io/gpu] 2020-06-13 10:55:07,745 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer: Trying to discover GPU information ... 2020-06-13 10:55:10,601 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer: Discovered GPU information: === GPUs in the system === Driver Version:440.82 ProductName=GeForce GTX 950M, MinorNumber=0, TotalMemory=2004MiB, Utilization=2.0% Thanks charles -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: NoClassDefFoundError: scala/Product$class
The org.bdgenomics.adam is one of the Components of the GATK, and I just download the release version from its github website . However, when I build a new docker image with spark2.4.5 and scala 2.12.4,It works well and that makes me confused. root@master2:~# pyspark Python 2.7.17 (default, Apr 15 2020, 17:20:14) [GCC 7.5.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. 20/06/08 01:44:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.5 /_/ Using Python version 2.7.17 (default, Apr 15 2020 17:20:14) SparkSession available as 'spark'. root@master2:~# scala -version Scala code runner version 2.12.4 -- Copyright 2002-2017, LAMP/EPFL and Lightbend, Inc. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: NoClassDefFoundError: scala/Product$class
Hi Pol, thanks for your suggestion, I am going to use Spark-3.0.0 for GPU acceleration,so I update the scala to the *version 2.12.11* and the latest *2.13* ,but the error is still there, and by the way , the Spark version is *spark-3.0.0-preview2-bin-without-hadoop* Caused by: java.lang.ClassNotFoundException: scala.Product$class at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) Charles cai -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
NoClassDefFoundError: scala/Product$class
Hi, I run the GATK MarkDuplicates in Spark mode and it throws an *NoClassDefFoundError: scala/Product$class*. The GATK version is 4.1.7 and 4.0.0,the environment is: spark-3.0.0, scala-2.11.12 *GATK commands:* gatk MarkDuplicatesSpark \ -I hdfs://master2:9000/Drosophila/output/Drosophila.sorted.bam \ -O hdfs://master2:9000/Drosophila/output/Drosophila.sorted.markdup.bam \ -M hdfs://master2:9000/Drosophila/output/Drosophila.sorted.markdup_metrics.txt \ -- \ --spark-runner SPARK --spark-master spark://master2:7077 *error logs:* Exception in thread "main" java.lang.NoClassDefFoundError: scala/Product$class at org.bdgenomics.adam.serialization.InputStreamWithDecoder.(ADAMKryoRegistrator.scala:35) at org.bdgenomics.adam.serialization.AvroSerializer.(ADAMKryoRegistrator.scala:45) at org.bdgenomics.adam.models.VariantContextSerializer.(VariantContext.scala:94) at org.bdgenomics.adam.serialization.ADAMKryoRegistrator.registerClasses(ADAMKryoRegistrator.scala:179) at org.broadinstitute.hellbender.engine.spark.GATKRegistrator.registerClasses(GATKRegistrator.java:78) at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$8(KryoSerializer.scala:170) at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$8$adapted(KryoSerializer.scala:170) at scala.Option.foreach(Option.scala:407) at org.apache.spark.serializer.KryoSerializer.$anonfun$newKryo$5(KryoSerializer.scala:170) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:221) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:161) at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102) at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109) at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:336) at org.apache.spark.serializer.KryoSerializationStream.(KryoSerializer.scala:256) at org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:422) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:309) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:137) at org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:91) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:35) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:77) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1494) at org.apache.spark.rdd.NewHadoopRDD.(NewHadoopRDD.scala:80) at org.apache.spark.SparkContext.$anonfun$newAPIHadoopFile$2(SparkContext.scala:1235) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.SparkContext.withScope(SparkContext.scala:771) at org.apache.spark.SparkContext.newAPIHadoopFile(SparkContext.scala:1221) at org.apache.spark.api.java.JavaSparkContext.newAPIHadoopFile(JavaSparkContext.scala:484) at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource.getParallelReads(ReadsSparkSource .java:112) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.getUnfilteredReads(GATKSparkTool.java:254) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.getReads(GATKSparkTool.java:220) at org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark.runTool(MarkDupli catesSpark.java:72) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:387) at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:30 ) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.jav a:179) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:152) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) at org.broadinstitute.hellbender.Main.main(Main.java:275) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:4