Could you do something like this prior to calling the action. // Create FileSystem object from Hadoop Configuration val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration) // This methods returns Boolean (true - if file exists, false - if file doesn't exist val fileExists = fs.exists(new Path("<parh_to_file>")) if (fileExists) println("File exists!") else println("File doesn't exist!")
Not sure that will help you or not, just a thought. -Todd On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks Brandon! > > i should have remembered that. > > basically the code gets out with sys.exit(1) if it cannot find the file > > I guess there is no easy way of validating DF except actioning it by > show(1,0) etc and checking if it works? > > Regards, > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 5 May 2020 at 16:41, Brandon Geise <brandonge...@gmail.com> wrote: > >> You could use the Hadoop API and check if the file exists. >> >> >> >> *From: *Mich Talebzadeh <mich.talebza...@gmail.com> >> *Date: *Tuesday, May 5, 2020 at 11:25 AM >> *To: *"user @spark" <user@spark.apache.org> >> *Subject: *Exception handling in Spark >> >> >> >> Hi, >> >> >> >> As I understand exception handling in Spark only makes sense if one >> attempts an action as opposed to lazy transformations? >> >> >> >> Let us assume that I am reading an XML file from the HDFS directory and >> create a dataframe DF on it >> >> >> >> val broadcastValue = "123456789" // I assume this will be sent as a >> constant for the batch >> >> // Create a DF on top of XML >> val df = spark.read. >> format("com.databricks.spark.xml"). >> option("rootTag", "hierarchy"). >> option("rowTag", "sms_request"). >> load("/tmp/broadcast.xml") >> >> val newDF = df.withColumn("broadcastid", lit(broadcastValue)) >> >> newDF.createOrReplaceTempView("tmp") >> >> // Put data in Hive table >> // >> sqltext = """ >> INSERT INTO TABLE michtest.BroadcastStaging PARTITION >> (broadcastid="123456", brand) >> SELECT >> ocis_party_id AS partyId >> , target_mobile_no AS phoneNumber >> , brand >> , broadcastid >> FROM tmp >> """ >> // >> >> // Here I am performing a collection >> >> try { >> >> spark.sql(sqltext) >> >> } catch { >> >> case e: SQLException => e.printStackTrace >> >> sys.exit() >> >> } >> >> >> >> Now the issue I have is that what if the xml file /tmp/broadcast.xml >> does not exist or deleted? I won't be able to catch the error until the >> hive table is populated. Of course I can write a shell script to check if >> the file exist before running the job or put small collection like >> df.show(1,0). Are there more general alternatives? >> >> >> >> Thanks >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn >> *https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >