Re: Probable Spark Bug while inserting into flat GCS bucket?

Mich Talebzadeh Sat, 19 Aug 2023 14:18:04 -0700

Under gs directory

"gs://test_dd1/abc/"


What do you see?

gsutil ls gs://test_dd1/abc

and the same

gs://test_dd1/

gsutil ls gs://test_dd1

I suspect you need a folder for multiple ORC slices!



Mich Talebzadeh,
Solutions Architect/Engineering Lead
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 19 Aug 2023 at 21:36, Dipayan Dev <dev.dipaya...@gmail.com> wrote:

> Hi Everyone,
>
> I'm stuck with one problem, where I need to provide a custom GCS location
> for the Hive table from Spark. The code fails while doing an *'insert
> into'* whenever my Hive table has a flag GS location like
> gs://<bucket_name>, but works for nested locations like
> gs://bucket_name/blob_name.
>
> Is anyone aware if it's an issue from Spark side or any config I need to
> pass for it?
>
> *The issue is happening in 2.x and 3.x both.*
>
> Config using:
>
> spark.conf.set("spark.hadoop.hive.exec.dynamic.partition.mode", "nonstrict")
> spark.conf.set("spark.hadoop.hive.exec.dynamic.partition", true)
> spark.conf.set("hive.exec.dynamic.partition.mode","nonstrict")
> spark.conf.set("hive.exec.dynamic.partition", true)
>
>
> *Case 1 : FAILS*
>
> val DF = Seq(("test1", 123)).toDF("name", "num")
>  val partKey = List("num").map(x => x)
>
> DF.write.option("path", 
> "gs://test_dd1/").mode(SaveMode.Overwrite).partitionBy(partKey: 
> _*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb1")
>
> val DF1 = Seq(("test2", 125)).toDF("name", "num")
> DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb1")
>
>
>
>
>
> *java.lang.NullPointerException  at 
> org.apache.hadoop.fs.Path.<init>(Path.java:141)  at 
> org.apache.hadoop.fs.Path.<init>(Path.java:120)  at 
> org.apache.hadoop.fs.Path.suffix(Path.java:441)  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)*
>
>
> *Case 2: Succeeds  *
>
> val DF = Seq(("test1", 123)).toDF("name", "num")
>  val partKey = List("num").map(x => x)
>
> DF.write.option("path", 
> "gs://test_dd1/abc/").mode(SaveMode.Overwrite).partitionBy(partKey: 
> _*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb2")
>
> val DF1 = Seq(("test2", 125)).toDF("name", "num")
>
> DF1.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb2")
>
>
> With Best Regards,
>
> Dipayan Dev
>

Re: Probable Spark Bug while inserting into flat GCS bucket?

Reply via email to