Re: Current alternatives for async I/O

Fabian Hueske Thu, 13 Oct 2016 04:36:57 -0700

Hi Ken,

FYI: we just received a pull request for FLIP-12 [1].


Best, Fabian

[1] https://github.com/apache/flink/pull/2629

2016-10-11 9:35 GMT+02:00 Fabian Hueske <fhue...@gmail.com>:

> Hi Ken,
>
> I think your solution should work.
> You need to make sure though, that you properly manage the state of your
> function, i.e., memorize all records which have been received but haven't
> be emitted yet.
> Otherwise records might get lost in case of a failure.
>
> Alternatively, you can implement this as a custom operators. This would
> give you full access but you would need to take care of organizing
> checkpoints and other low-level issues yourself. This would also be
> basically the same as implementing FLIP-12 (or a subset of it).
>
> Best, Fabian
>
>
> 2016-10-09 3:31 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>:
>
>> Hi all,
>>
>> I’ve been watching the FLIP-12
>> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673> 
>> design
>> discussion, and it looks like a promising solution for the issues we’ve got
>> with needing to make asynchronous multi-threaded requests in a Flink
>> operator.
>>
>> What’s the best workaround with current releases of Flink?
>>
>> One option is to have a special tickler source that broadcasts a Tuple0
>> every X milliseconds, which gets connected to the real stream that feeds a
>> CoFlatMap. Inside of this I’ve got queues for incoming and generated
>> tuples, with a thread pool to pull from the incoming and write to the
>> generated queues. When I get one of the “tickle” Tuple0s, I emit all of the
>> generated tuples.
>>
>> There are issues with needing to bound the size of the queues, and all of
>> the usual fun with thread pools, but it seems to work.
>>
>> Is there a better/simpler approach?
>>
>> Thanks,
>>
>> — Ken
>>
>> --------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>>
>>
>

Re: Current alternatives for async I/O

Reply via email to