Hi, Having started using Spark Streaming with Kafka it seems that it offers a number of good opportunities. I am considering using for Complex Event Processing (CEP) by building CEP adaptors and Spark transformers.
Anyway before going there I would like to see if anyone has done benchmarks on Spark Streaming specifically with regard to volume and velocity of data being streamed in. To clarify one of the parameters that I found out is the calibration of Spark Streaming by setting StreamingContext(sparkConf, Seconds(nn)) where nn is the number of seconds to an optimum value based on the frequency data streaming in. For example if the data is streaming in at the rate one tick every 60 seconds would you put nn = 55 seconds? Also what are dependencies on the volume of data coming in. Are they certain parameters to scale the performance of Spark Streaming and how caching i.e. val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topic) messages.cache() is going to improve the performance? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com