Hi, could you please share your thoughts if anyone has idea on the below topics.
- How to achieve high availability with spark cluster? I have referred to the link *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/exercises/spark-exercise-standalone-master-ha.html <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/exercises/spark-exercise-standalone-master-ha.html>* . is there any other way to do in cluster mode? - How to achieve high availability of spark driver? I have gone through documentation that it is achieved through check pointing directory. is there any other way? - what is the procedure to know the number of messages that have been consumed by the consumer? is there any way to tack the number of messages consumed in spark streaming. - I also want to save data from the spark streaming periodically and do the aggregation on that. lets say, save date for every hour/day etc and do aggregations on that. Thanks, Asmath.
