Thats right. But the serialization would be happening in Spark 2.3 also, why we dont see this error there?
On Mon, 2 Jan 2023 at 9:09 PM, Sean Owen <sro...@gmail.com> wrote: > Oh, it's because you are defining "spark" within your driver object, and > then it's getting serialized because you are trying to use TestMain methods > in your program. > This was never correct, but now it's an explicit error in Spark 3. The > session should not be a member variable. > > On Mon, Jan 2, 2023 at 9:24 AM Shrikant Prasad <shrikant....@gmail.com> > wrote: > >> Please see these logs. The error is thrown in executor: >> >> 23/01/02 15:14:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID >> 0) >> >> java.lang.ExceptionInInitializerError >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:498) >> >> at >> java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:498) >> >> at >> java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1274) >> >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2196) >> >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >> at >> java.io.ObjectInputStream.readArray(ObjectInputStream.java:2093) >> >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1655) >> >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >> >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >> >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >> >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >> >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >> >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >> >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >> at >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2405) >> >> at >> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2329) >> >> at >> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2187) >> >> at >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1667) >> >> at >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:503) >> >> at >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:461) >> >> at >> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) >> >> at >> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) >> >> at >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) >> >> at org.apache.spark.scheduler.Task.run(Task.scala:127) >> >> at >> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) >> >> at >> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) >> >> at >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) >> >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> >> at java.lang.Thread.run(Thread.java:748) >> >> Caused by: org.apache.spark.SparkException: A master URL must be set in >> your configuration >> >> at org.apache.spark.SparkContext.<init>(SparkContext.scala:385) >> >> at >> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2574) >> >> at >> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:934) >> >> at scala.Option.getOrElse(Option.scala:189) >> >> at >> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:928) >> >> at TestMain$.<init>(TestMain.scala:12) >> >> at TestMain$.<clinit>(TestMain.scala) >> >> On Mon, 2 Jan 2023 at 8:29 PM, Sean Owen <sro...@gmail.com> wrote: >> >>> It's not running on the executor; that's not the issue. See your stack >>> trace, where it clearly happens in the driver. >>> >>> On Mon, Jan 2, 2023 at 8:58 AM Shrikant Prasad <shrikant....@gmail.com> >>> wrote: >>> >>>> Even if I set the master as yarn, it will not have access to rest of >>>> the spark confs. It will need spark.yarn.app.id. >>>> >>>> The main issue is if its working as it is in Spark 2.3 why its not >>>> working in Spark 3 i.e why the session is getting created on executor. >>>> Another thing we tried is removing the df to rdd conversion just for >>>> debug and it works in Spark 3. >>>> >>>> So, it might be something to do with df to rdd conversion or >>>> serialization behavior change from Spark 2.3 to Spark 3.0 if there is any. >>>> But couldn't find the root cause. >>>> >>>> Regards, >>>> Shrikant >>>> >>>> On Mon, 2 Jan 2023 at 7:54 PM, Sean Owen <sro...@gmail.com> wrote: >>>> >>>>> So call .setMaster("yarn"), per the error >>>>> >>>>> On Mon, Jan 2, 2023 at 8:20 AM Shrikant Prasad <shrikant....@gmail.com> >>>>> wrote: >>>>> >>>>>> We are running it in cluster deploy mode with yarn. >>>>>> >>>>>> Regards, >>>>>> Shrikant >>>>>> >>>>>> On Mon, 2 Jan 2023 at 6:15 PM, Stelios Philippou <stevo...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Can we see your Spark Configuration parameters ? >>>>>>> >>>>>>> The mater URL refers to as per java >>>>>>> new SparkConf()....setMaster("local[*]") >>>>>>> according to where you want to run this >>>>>>> >>>>>>> On Mon, 2 Jan 2023 at 14:38, Shrikant Prasad <shrikant....@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I am trying to migrate one spark application from Spark 2.3 to >>>>>>>> 3.0.1. >>>>>>>> >>>>>>>> The issue can be reproduced using below sample code: >>>>>>>> >>>>>>>> object TestMain { >>>>>>>> >>>>>>>> val session = >>>>>>>> SparkSession.builder().appName("test").enableHiveSupport().getOrCreate() >>>>>>>> >>>>>>>> def main(args: Array[String]): Unit = { >>>>>>>> >>>>>>>> import session.implicits._ >>>>>>>> val a = *session.*sparkContext.parallelize(*Array* >>>>>>>> (("A",1),("B",2))).toDF("_c1","_c2").*rdd*.map(x=> >>>>>>>> x(0).toString).collect() >>>>>>>> *println*(a.mkString("|")) >>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> It runs successfully in Spark 2.3 but fails with Spark 3.0.1 with >>>>>>>> below exception: >>>>>>>> >>>>>>>> Caused by: org.apache.spark.SparkException: A master URL must be >>>>>>>> set in your configuration >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:394) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2690) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949) >>>>>>>> >>>>>>>> at scala.Option.getOrElse(Option.scala:189) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943) >>>>>>>> >>>>>>>> at TestMain$.<init>(TestMain.scala:7) >>>>>>>> >>>>>>>> at TestMain$.<clinit>(TestMain.scala) >>>>>>>> >>>>>>>> >>>>>>>> From the exception it appears that it tries to create spark session >>>>>>>> on executor also in Spark 3 whereas its not created again on executor >>>>>>>> in >>>>>>>> Spark 2.3. >>>>>>>> >>>>>>>> Can anyone help in identfying why there is this change in behavior? >>>>>>>> >>>>>>>> Thanks and Regards, >>>>>>>> >>>>>>>> Shrikant >>>>>>>> >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> Shrikant Prasad >>>>>>>> >>>>>>> -- >>>>>> Regards, >>>>>> Shrikant Prasad >>>>>> >>>>> -- >>>> Regards, >>>> Shrikant Prasad >>>> >>> -- >> Regards, >> Shrikant Prasad >> > -- Regards, Shrikant Prasad