Requirements looks like my previous project for smart metering. We finally did custom solution without Spark, Hadoop and Kafka but it was 4 years ago when I did not have experience with this technologies ( some not existed or were not mature).
If you do need any relational processing of your messages ( basing on historical data, or joining with other messages) and message processing is quite independent Kafka plus Spark Streaming could be overkill. The best to check if your data has natural index like timestamp in metering data which come in the same frequency (every second) and basing on it do access to your cache and disc. For cache for me most promising looks Alluxio. BR, Arkadiusz Bicz On Tue, Apr 19, 2016 at 6:01 AM, Deepak Sharma <deepakmc...@gmail.com> wrote: > Hi all, > I am looking for an architecture to ingest 10 mils of messages in the micro > batches of seconds. > If anyone has worked on similar kind of architecture , can you please point > me to any documentation around the same like what should be the architecture > , which all components/big data ecosystem tools should i consider etc. > The messages has to be in xml/json format , a preprocessor engine or message > enhancer and then finally a processor. > I thought about using data cache as well for serving the data > The data cache should have the capability to serve the historical data in > milliseconds (may be upto 30 days of data) > -- > Thanks > Deepak > www.bigdatabig.com > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org