Re: Duplicates in self join

2018-10-09 Thread Eric L Goodman
; based on its timestamp and the join window interval. > > Best, > Fabian > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joining.html#interval-join > > Am Mo., 8. Okt. 2018 um 16:44 Uhr schrieb Eric L Goodman < > eric.good...@colo

Re: Duplicates in self join

2018-10-09 Thread Eric L Goodman
assumption is correct, you can add a ProcessFunction after the join > to do distinct. > > Best, Hequn > > On Mon, Oct 8, 2018 at 10:37 PM Eric L Goodman > wrote: > >> If I change it to a Tumbling window some of the results will be lost >> since the pattern I'm mat

Re: Duplicates in self join

2018-10-08 Thread Eric L Goodman
nk/flink-docs-release-1.6/api/java/org/apache/flink/streaming/api/windowing/evictors/Evictor.html >> >> Best Regards, >> Dominik. >> >> pon., 8 paź 2018 o 08:00 Eric L Goodman >> napisał(a): >> >>> What is the best way to avoid or remove duplica

Duplicates in self join

2018-10-07 Thread Eric L Goodman
What is the best way to avoid or remove duplicates when joining a stream with itself? I'm performing a streaming temporal triangle computation and the first part is to find triads of two edges of the form vertexA->vertexB and vertexB->vertexC (and there are temporal constraints where the first edg

multiple input streams

2018-08-31 Thread Eric L Goodman
If I have a standalone cluster running flink, what is the best way to ingest multiple streams of the same type of data? For example, if I open a socket text stream, does the socket only get opened on the master node and then the stream is partitioned out to the worker nodes? DataStream text = env