Re: Assertion of return value of dataframe in pytest

Mich Talebzadeh Wed, 03 Feb 2021 10:43:25 -0800

Thanks Marco.

This is an approach
   # Start as we defined the dataframe to write to Oracle
   df2 = house_df. \
        select( \
        F.date_format('datetaken', 'yyyy').cast("Integer").alias('YEAR') \
        , 'REGIONNAME' \
        ,
round(F.avg('averageprice').over(wSpecY)).alias('AVGPRICEPERYEAR') \
        ,
round(F.avg('flatprice').over(wSpecY)).alias('AVGFLATPRICEPERYEAR') \
        ,
round(F.avg('TerracedPrice').over(wSpecY)).alias('AVGTERRACEDPRICEPERYEAR')
\
        ,
round(F.avg('SemiDetachedPrice').over(wSpecY)).alias('AVGSDPRICEPRICEPERYEAR')
\
        ,
round(F.avg('DetachedPrice').over(wSpecY)).alias('AVGDETACHEDPRICEPERYEAR')).
\
        distinct().orderBy('datetaken', asending=True)
    assert df2.count() >= 0
   # Write to Oracle table using df2 dataframe with overwrite option
    s.writeTableToOracle(df2, "overwrite",
config['OracleVariables']['dbschema'],
config['OracleVariables']['yearlyAveragePricesAllTable'])
    # Read data you already wrote to Oracle table indo read_df dataframe
    fullyQualifiedTableName =  config['OracleVariables']['dbschema'] + '.'
+ config['OracleVariables']['yearlyAveragePricesAllTable']
    read_df = s.loadTableFromOracleJDBC(spark, fullyQualifiedTableName)
    # test that you get the same number of rows as you wrote
    assert df2.subtract(read_df).count() == 0



This passes the test.



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 3 Feb 2021 at 18:12, Sofia’s World <mmistr...@gmail.com> wrote:

> Hello
>  my 2cents/./
> well that will be an integ test to write to a 'dev' database. (which you
> might pre-populate and clean up after your runs, so you can have repeatable
> data).
> then either you
> 1 - use normal sql and assert that the values you store in your dataframe
> are the same as what you get from your sql
> 2 - surely , as there is a dataframe.write,  there would be also a
> dataframe.read that you can use?
>
>
> hth
>  Marco
>
>
>
> On Wed, Feb 3, 2021 at 4:51 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> It appears that the following assertion works assuming that result set
>> can be = 0 (no data) or > 0 there is data
>>
>> assert df2.count() >= 0
>>
>> However, if I wanted to write to a JDBC database from PySpark through a
>> function (already defined in another module) as below
>>
>>
>> def writeTableToOracle(dataFrame,mode,dataset,tableName):
>>
>>     try:
>>
>>         dataFrame. \
>>
>>             write. \
>>
>>             format("jdbc"). \
>>
>>             option("url", oracle_url). \
>>
>>             option("dbtable", tableName). \
>>
>>             option("user", config['OracleVariables']['oracle_user']). \
>>
>>             option("password",
>> config['OracleVariables']['oracle_password']). \
>>
>>             option("driver", config['OracleVariables']['oracle_driver']).
>> \
>>
>>             mode(mode). \
>>
>>             save()
>>
>>     except Exception as e:
>>
>>         print(f"""{e}, quitting""")
>>
>>         sys.exit(1)
>>
>>
>> and call it in the program
>>
>>
>> from sparkutils import sparkstuff as s
>>
>>
>> s.writeTableToOracle(df2,"overwrite",config['OracleVariables']['dbschema'],config['OracleVariables']['yearlyAveragePricesAllTable'])
>>
>>
>> How can one assert its validity in PyTest?
>>
>>
>> Thanks again
>>
>> On Wed, 3 Feb 2021 at 15:12, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> In Pytest you want to ensure that the composed DF has the correct return.
>>>
>>> Example
>>>
>>>     df2 = house_df. \
>>>         select( \
>>>         F.date_format('datetaken', 'yyyy').cast("Integer").alias('YEAR')
>>> \
>>>         , 'REGIONNAME' \
>>>         ,
>>> round(F.avg('averageprice').over(wSpecY)).alias('AVGPRICEPERYEAR') \
>>>         ,
>>> round(F.avg('flatprice').over(wSpecY)).alias('AVGFLATPRICEPERYEAR') \
>>>         ,
>>> round(F.avg('TerracedPrice').over(wSpecY)).alias('AVGTERRACEDPRICEPERYEAR')
>>> \
>>>         ,
>>> round(F.avg('SemiDetachedPrice').over(wSpecY)).alias('AVGSDPRICEPRICEPERYEAR')
>>> \
>>>         ,
>>> round(F.avg('DetachedPrice').over(wSpecY)).alias('AVGDETACHEDPRICEPERYEAR')).
>>> \
>>>         distinct().orderBy('datetaken', asending=True)
>>>
>>> Will that be enough to run just this command
>>>
>>>   assert not []
>>>
>>> I believe that may be flawed because any error will be assumed to be NOT
>>> NULL?
>>>
>>> Thanks
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>

Re: Assertion of return value of dataframe in pytest

Reply via email to