Under gs directory "gs://test_dd1/abc/"
What do you see? gsutil ls gs://test_dd1/abc and the same gs://test_dd1/ gsutil ls gs://test_dd1 I suspect you need a folder for multiple ORC slices! Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Sat, 19 Aug 2023 at 21:36, Dipayan Dev <dev.dipaya...@gmail.com> wrote: > Hi Everyone, > > I'm stuck with one problem, where I need to provide a custom GCS location > for the Hive table from Spark. The code fails while doing an *'insert > into'* whenever my Hive table has a flag GS location like > gs://<bucket_name>, but works for nested locations like > gs://bucket_name/blob_name. > > Is anyone aware if it's an issue from Spark side or any config I need to > pass for it? > > *The issue is happening in 2.x and 3.x both.* > > Config using: > > spark.conf.set("spark.hadoop.hive.exec.dynamic.partition.mode", "nonstrict") > spark.conf.set("spark.hadoop.hive.exec.dynamic.partition", true) > spark.conf.set("hive.exec.dynamic.partition.mode","nonstrict") > spark.conf.set("hive.exec.dynamic.partition", true) > > > *Case 1 : FAILS* > > val DF = Seq(("test1", 123)).toDF("name", "num") > val partKey = List("num").map(x => x) > > DF.write.option("path", > "gs://test_dd1/").mode(SaveMode.Overwrite).partitionBy(partKey: > _*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb1") > > val DF1 = Seq(("test2", 125)).toDF("name", "num") > DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb1") > > > > > > *java.lang.NullPointerException at > org.apache.hadoop.fs.Path.<init>(Path.java:141) at > org.apache.hadoop.fs.Path.<init>(Path.java:120) at > org.apache.hadoop.fs.Path.suffix(Path.java:441) at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)* > > > *Case 2: Succeeds * > > val DF = Seq(("test1", 123)).toDF("name", "num") > val partKey = List("num").map(x => x) > > DF.write.option("path", > "gs://test_dd1/abc/").mode(SaveMode.Overwrite).partitionBy(partKey: > _*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb2") > > val DF1 = Seq(("test2", 125)).toDF("name", "num") > > DF1.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb2") > > > With Best Regards, > > Dipayan Dev >