I don't think these will blow anyones minds but:
1) Row counts. Most of our jobs 'recompute the world' nightly so we can
expect to see fairly predictable row variances.
2) Rolling snapshots. We can also expect that for some critical datasets
we can compute a rolling average for important
Hi Folks,
I'm working on updating a talk and I was wondering if any folks in the
community wanted to share their best practices for validating your Spark
jobs? Are there any counters folks have found useful for
monitoring/validating your Spark jobs?
Cheers,
Holden :)
--
Twitter: