Hey they are good libraries..to get you started. Have used both of them.. unfortunately -as far as i saw when i started to use them - only few people maintains them. But you can get pointers out of them for writing tests. the code below can get you started What you'll need is
- a method to create dataframe on the fly, perhaps from a string. you can have a look at pandas, it will have methods for it - a method to test dataframe equality. you can use df1.subtract(df2) I am assuming you are into dataframes - rather than RDDs, for which the two packages you mention should have everything you need hht marco import logging from pyspark.sql import SparkSession from pyspark import HiveContext from pyspark import SparkConf from pyspark import SparkContext import pyspark from pyspark.sql import SparkSession import pytest import shutil @pytest.fixture def spark_session(): return SparkSession.builder \ .master('local[1]') \ .appName('SparkByExamples.com') \ .getOrCreate() def test_create_table(spark_session): df = spark_session.createDataFrame([['one', 'two']]).toDF(*['first', 'second']) print(df.show()) df2 = spark_session.createDataFrame([['one', 'two']]).toDF(*['first', 'second']) assert df.subtract(df2).count() == 0 On Thu, Nov 19, 2020 at 6:38 AM Sachit Murarka <connectsac...@gmail.com> wrote: > Hi Users, > > I have to write Unit Test cases for PySpark. > I think pytest-spark and "spark testing base" are good test libraries. > > Can anyone please provide full reference for writing the test cases in > Python using these? > > Kind Regards, > Sachit Murarka >