[PySpark] Sharing testing library and requesting feedback

Matt Hagy Fri, 26 Oct 2018 06:32:06 -0700

We recently open sourced mockrdd, a library for testing PySpark code.
github.com/LiveRamp/mockrdd


The mockrdd.MockRDD class offers similar behavior to pyspark.RDD with the
following extra benefits.
* Extensive sanity checks to identify invalid inputs
* More meaningful error messages for debugging issues
* Straightforward to running within pdb
* Removes Spark dependencies from development and testing environments
* No Spark overhead when running through a large test suite

More details in this blog post:
liveramp.com/engineering/introducing-mockrdd-for-testing-pyspark-code

Would anyone find this useful? What other features would make this more
useful? Are there benefits to using PySpark in local mode for testing that
we're not considering?

Thanks!

[PySpark] Sharing testing library and requesting feedback

Reply via email to