Here are some differences between the two: - KafkaStreams is a library, whereas Samza is a framework, which makes the learning curve of KafkaStreams a bit easier. - Sources - KafkaStreams works with Kafka alone, while Samza can also be configured with Kinesis, ElasticSearch, HDFS and others. - Deployment - Samza works closely with Yarn (although not a must), whereas KafkaStreams can be run and deployed as a simple Java library, where running more instances of it will cause and automatic load balance between the processes. Cluster is not required in KafkaStreams. - StateManagement - both have local state, In KafkaStreams there are common statefull operations (e.g join, aggregation, map) that are made simpler, you just call the function and the state is managed behind the scenes, needless to be defined explicitly - Configuration - In Samza there's a configuration file, whereas in KS it's all inside your class. - Code unification with batch jobs - Samza code can be written once for both ongoing stream processing and batch jobs, by allowing running samza jobs on Hadoop cluster - Samza supports host-affinity, allocating the same machine (that has the local state stored) after a job restarts, preventing startup latency loading the state - Samza supports Async I/O model - significantly improve the performance of jobs bottlenecked on remote I/O. - Samza has Rest API to query its processing streams, start & stop jobs - Samza is a bit more matured (KafkaStreams is the new kid in the block)
*Ofir Sharony* BackEnd Tech Lead Mobile: +972-54-7560277 | ofir.shar...@myheritage.com | www.myheritage.com MyHeritage Ltd., 3 Ariel Sharon St., Or Yehuda 60250, Israel <http://www.myheritage.com/> <https://www.facebook.com/myheritage> <https://twitter.com/myheritage> <http://blog.myheritage.com/> <https://www.youtube.com/user/MyHeritageLtd> On Mon, Dec 26, 2016 at 8:07 AM, 황보동규 <hwangb...@gmail.com> wrote: > Hi there! > > I’m newbie on Kafka. > I have an interest in streaming service, especially Kafka streaming. But I > have no Idea what’s the difference between Kafka streaming and samza. > Both has similiar architecture and functionality, I think. > What’s the main difference? What’s the pros and cons? It’s really helpful > with your kind explanation. It’s also welcome to give me helpful > documentation relate of my question. > > Thanks, > Dongkyu