We recently open sourced mockrdd, a library for testing PySpark code.
github.com/LiveRamp/mockrdd

The mockrdd.MockRDD class offers similar behavior to pyspark.RDD with the
following extra benefits.
* Extensive sanity checks to identify invalid inputs
* More meaningful error messages for debugging issues
* Straightforward to running within pdb
* Removes Spark dependencies from development and testing environments
* No Spark overhead when running through a large test suite

More details in this blog post:
liveramp.com/engineering/introducing-mockrdd-for-testing-pyspark-code

Would anyone find this useful? What other features would make this more
useful? Are there benefits to using PySpark in local mode for testing that
we're not considering?

Thanks!

Reply via email to