Hi community, I built a simple count and sum spark application which uses the combineByKey transformation [1] and I would like to monitor the throughput in/out of this transformation and the latency that the combineByKey spends to pre-aggregate tuples. Ideally, the latency I would like to take the average of the last 30 seconds using a histogram and the 99th percentile.
I was imagining to add a dropwizard metrics [2] on the combiner function that I pass to the combineByKey. But It is confused because there are 2 more functions that I must pass to the combineByKey. How would you suggest me to implement this monitoring strategy? Thanks, Felipe [1] https://github.com/felipegutierrez/explore-spark/blob/master/src/main/scala/org/sense/spark/app/combiners/TaxiRideCountCombineByKey.scala#L40 [2] https://metrics.dropwizard.io/4.1.2/getting-started.html -- -- Felipe Gutierrez -- skype: felipe.o.gutierrez -- https://felipeogutierrez.blogspot.com --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org