It appears that the following assertion works assuming that result set can
be = 0 (no data) or > 0 there is data

assert df2.count() >= 0

However, if I wanted to write to a JDBC database from PySpark through a
function (already defined in another module) as below

def writeTableToOracle(dataFrame,mode,dataset,tableName):


        dataFrame. \

            write. \

            format("jdbc"). \

            option("url", oracle_url). \

            option("dbtable", tableName). \

            option("user", config['OracleVariables']['oracle_user']). \

config['OracleVariables']['oracle_password']). \

            option("driver", config['OracleVariables']['oracle_driver']). \

            mode(mode). \


    except Exception as e:

        print(f"""{e}, quitting""")


and call it in the program

from sparkutils import sparkstuff as s


How can one assert its validity in PyTest?

Thanks again

On Wed, 3 Feb 2021 at 15:12, Mich Talebzadeh <>

> Hi,
> In Pytest you want to ensure that the composed DF has the correct return.
> Example
>     df2 = house_df. \
>         select( \
>         F.date_format('datetaken', 'yyyy').cast("Integer").alias('YEAR') \
>         , 'REGIONNAME' \
>         ,
> round(F.avg('averageprice').over(wSpecY)).alias('AVGPRICEPERYEAR') \
>         ,
> round(F.avg('flatprice').over(wSpecY)).alias('AVGFLATPRICEPERYEAR') \
>         ,
> round(F.avg('TerracedPrice').over(wSpecY)).alias('AVGTERRACEDPRICEPERYEAR')
> \
>         ,
> round(F.avg('SemiDetachedPrice').over(wSpecY)).alias('AVGSDPRICEPRICEPERYEAR')
> \
>         ,
> round(F.avg('DetachedPrice').over(wSpecY)).alias('AVGDETACHEDPRICEPERYEAR')).
> \
>         distinct().orderBy('datetaken', asending=True)
> Will that be enough to run just this command
>   assert not []
> I believe that may be flawed because any error will be assumed to be NOT
> Thanks
