If the data does not have a key (or you do not care about it) you can also use a FlatMapFunction (or ProcessFunction) with Operator State. Operator State is not bound to a key but to a parallel operator instance. Have a look at the ListCheckpointed interface and its JavaDocs.
2017-06-23 18:27 GMT+02:00 Edward <egb...@hotmail.com>: > So there is no way to do a countWindow(100) and preserve data locality? > > My use case is this: augment a data stream with new fields from DynamoDB > lookup. DynamoDB allows batch get's of up to 100 records, so I am trying to > collect 100 records before making that call. I have no other reason to do a > repartitioning, so I am hoping to avoid incurring the cost of shipping all > the data across the network to do this. > > If I use countWindowAll, I am limited to parallelism = 1, so all data gets > repartitioned twice. And if I use keyBy().countWindow(), then it gets > repartitioned by key. So in both cases I lose locality. > > Am I missing any other options? > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Strange-behavior-of-DataStream- > countWindow-tp7482p13981.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >