Hi,

Having started using Spark Streaming with Kafka it seems that it offers a
number of good opportunities. I am considering using for Complex Event
Processing (CEP) by building CEP adaptors and Spark transformers.

Anyway before going there I would like to see if anyone has done benchmarks
on Spark Streaming specifically with regard to volume and velocity of data
being streamed in.

To clarify one of the parameters that I found out is the calibration of
Spark Streaming by setting StreamingContext(sparkConf, Seconds(nn)) where
nn is the number of seconds to an optimum value based on the frequency data
streaming in. For example if the data is streaming in at the rate one tick
every 60 seconds would you put nn = 55 seconds?

Also what are dependencies on the volume of data coming in. Are they
certain parameters to scale the performance of Spark Streaming and how
caching i.e.

val messages = KafkaUtils.createDirectStream[String, String, StringDecoder,
StringDecoder](ssc, kafkaParams, topic)

messages.cache()

is going to improve the performance?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Reply via email to