Re: Slinding Window Join (without duplicates)

2015-11-24 Thread Aljoscha Krettek
Hi, I’m not sure this is a problem. If a user specifies sliding windows then one element can (and will) end up in several windows. If these are joined then there will be multiple results. If the user does not want multiple windows then tumbling windows should be used. IMHO, this is quite

Re: Slinding Window Join (without duplicates)

2015-11-24 Thread Matthias J. Sax
Stephan is right. A tumbling window does not help. The last tuple of window n and the first tuple of window n+1 are "close" to each other and should be joined for example. From a SQL-like point of view this is a very common case expressed as: SELECT * FROM s1,s2 WHERE s1.key = s2.key AND |s1.ts

Re: Slinding Window Join (without duplicates)

2015-11-24 Thread Stephan Ewen
Since sessions are built per key, they have groups of keys that are close enough together in time. They will, however, treat the closeness transitively... On Tue, Nov 24, 2015 at 11:33 AM, Matthias J. Sax wrote: > Stephan is right. A tumbling window does not help. The last

Re: Slinding Window Join (without duplicates)

2015-11-24 Thread Stephan Ewen
I understand Matthias' point. You want to join elements that occur within a time range of each other. In a tumbling window, you have strict boundaries and a pair of elements that arrives such that one element is before the boundary and one after, they will not join. Hence the sliding windows.

Slinding Window Join (without duplicates)

2015-11-23 Thread Matthias J. Sax
Hi, it seems that a join on the data streams with an overlapping sliding window produces duplicates in the output. The default implementation internally just use two nested-loops over both windows to compute the result. How can duplicates be avoided? Is there any way after all right now? If not,