Russell Is correct here. micro-batch means it does processing within a window. In general there are three things here.
batch window This is the basic interval at which the system with receive the data in batches. This is the interval set when creating a StreamingContext. For example, if you set the batch interval as 30 seconds, then any input DStream will generate RDDs of received data at 30 second intervals. Within streaming you have what is called "a window operator" which is defined by two parameters - - WindowDuration / WindowsLength - the length of the window - SlideDuration / SlidingInterval - the interval at which the window will slide or move forward Example batch window = 30 secconds window length = 10 minutes sliding interval = 5 minutes In that case, you would be creating an output every 5 minutes, aggregating data that you were collecting every 30 seconds over a previous 10 minutes period of time In general depending what you are doing you can tighten above parameters. For example if you are using Spark Streaming for Anti-fraud detection, you may stream data in at 2 seconds batch interval, Keep your windows length at 4 seconds and your sliding intervall = 2 seconds which gives you a kind of tight streaming. You are aggregating data that you are collecting over the batch Window. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 23 August 2016 at 17:34, Russell Spitzer <russell.spit...@gmail.com> wrote: > Spark streaming does not process 1 event at a time which is in general I > think what people call "Streaming." It instead processes groups of events. > Each group is a "MicroBatch" that gets processed at the same time. > > Streaming theoretically always has better latency because the event is > processed as soon as it arrives. While in microbatching the latency of all > the events in the batch can be no better than the last element to arrive. > > Streaming theoretically has worse performance because events cannot be > processed in bulk. > > In practice throughput and latency are very implementation dependent > > On Tue, Aug 23, 2016 at 8:41 AM Aseem Bansal <asmbans...@gmail.com> wrote: > >> I was reading this article https://www.inovex.de/blog/storm-in-a-teacup/ >> and it mentioned that spark streaming actually mini-batch not actual >> streaming. >> >> I have not used streaming and I am not sure what is the difference in the >> 2 terms. Hence could not make a judgement myself. >> >