Hi Ken, FYI: we just received a pull request for FLIP-12 [1].
Best, Fabian [1] https://github.com/apache/flink/pull/2629 2016-10-11 9:35 GMT+02:00 Fabian Hueske <fhue...@gmail.com>: > Hi Ken, > > I think your solution should work. > You need to make sure though, that you properly manage the state of your > function, i.e., memorize all records which have been received but haven't > be emitted yet. > Otherwise records might get lost in case of a failure. > > Alternatively, you can implement this as a custom operators. This would > give you full access but you would need to take care of organizing > checkpoints and other low-level issues yourself. This would also be > basically the same as implementing FLIP-12 (or a subset of it). > > Best, Fabian > > > 2016-10-09 3:31 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>: > >> Hi all, >> >> I’ve been watching the FLIP-12 >> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673> >> design >> discussion, and it looks like a promising solution for the issues we’ve got >> with needing to make asynchronous multi-threaded requests in a Flink >> operator. >> >> What’s the best workaround with current releases of Flink? >> >> One option is to have a special tickler source that broadcasts a Tuple0 >> every X milliseconds, which gets connected to the real stream that feeds a >> CoFlatMap. Inside of this I’ve got queues for incoming and generated >> tuples, with a thread pool to pull from the incoming and write to the >> generated queues. When I get one of the “tickle” Tuple0s, I emit all of the >> generated tuples. >> >> There are issues with needing to bound the size of the queues, and all of >> the usual fun with thread pools, but it seems to work. >> >> Is there a better/simpler approach? >> >> Thanks, >> >> — Ken >> >> -------------------------- >> Ken Krugler >> +1 530-210-6378 >> http://www.scaleunlimited.com >> custom big data solutions & training >> Hadoop, Cascading, Cassandra & Solr >> >> >