Hi Adrian, On every timestep of execution, we receive new data, then report updated word counts for that new data plus the past 30 seconds. The latency here is about how quickly you get these updated counts once the new batch of data comes in. It’s true that the count reflects some data from 30 seconds ago as well, but it doesn’t mean the overall processing latency is 30 seconds.
Matei On Mar 20, 2014, at 1:36 PM, Adrian Mocanu <amoc...@verticalscope.com> wrote: > I looked over the specs on page 9 from > http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf > The first paragraph mentions the window size is 30 seconds “Word-Count, which > performs a sliding window count over 30s; > and TopKCount, which finds the k most frequent words over the past 30s. “ > > The second paragraph mentions subsecond latency. > > Putting these 2 together, is the paper saying that in the 30 sec window the > tuples are delayed at most 1 second? > > The paper explains “By “end-to-end latency,” we mean the time from when > records are sent to the system to when results incorporating them appear.” > This leads me to conclude that end-to-end latency for a 30 sec window should > be at least 30 seconds because results won’t be incorporated until the entire > window is completed ie: 30sec. At the same time the paper claims latency is > sub second so clearly I’m misunderstanding something. > > -Adrian