Thanks for the tip Elias! On Wed, Jan 2, 2019 at 9:44 PM Elias Levy <fearsome.lucid...@gmail.com> wrote:
> One thing you must be careful of, is that if you are using event time > processing, assuming that the control stream will only receive messages > sporadically, is that event time will stop moving forward in the operator > joining the streams while the control stream is idle. You can get around > this by using a periodic watermark extractor one the control stream that > bounds the event time delay to processing time or by defining your own low > level operator that ignores watermarks from the control stream. > > On Wed, Jan 2, 2019 at 8:42 AM Avi Levi <avi.l...@bluevoyant.com> wrote: > >> Thanks Till I will defiantly going to check it. just to make sure that I >> got you correctly. you are suggesting the the list that I want to broadcast >> will be broadcasted via control stream and it will be than be kept in the >> relevant operator state correct ? and updates (CRUD) on that list will be >> preformed via the control stream. correct ? >> BR >> Avi >> >> On Wed, Jan 2, 2019 at 4:28 PM Till Rohrmann <trohrm...@apache.org> >> wrote: >> >>> Hi Avi, >>> >>> you could use Flink's broadcast state pattern [1]. You would need to use >>> the DataStream API but it allows you to have two streams (input and control >>> stream) where the control stream is broadcasted to all sub tasks. So by >>> ingesting messages into the control stream you can send model updates to >>> all sub tasks. >>> >>> [1] >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Dstable_dev_stream_state_broadcast-5Fstate.html&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=dpWtkT5FJRWFqDA3MAnB4-dRYGDQjgfQTYAocqGkRKo&m=u5UQh821Gau2wZ7S3M8IRmVpL5JxGADJaq_k7iq6sYo&s=uITdFlQPKLbqxkTux4nR21JhUpLIkS5Pdfi9D_ZSUwE&e=> >>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Dstable_dev_stream_state_broadcast-5Fstate.html&d=DwQFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=dpWtkT5FJRWFqDA3MAnB4-dRYGDQjgfQTYAocqGkRKo&m=u5UQh821Gau2wZ7S3M8IRmVpL5JxGADJaq_k7iq6sYo&s=uITdFlQPKLbqxkTux4nR21JhUpLIkS5Pdfi9D_ZSUwE&e=> >>> >>> Cheers, >>> Till >>> >>> On Tue, Jan 1, 2019 at 6:49 PM miki haiat <miko5...@gmail.com> wrote: >>> >>>> Im trying to understand your use case. >>>> What is the source of the data ? FS ,KAFKA else ? >>>> >>>> >>>> On Tue, Jan 1, 2019 at 6:29 PM Avi Levi <avi.l...@bluevoyant.com> >>>> wrote: >>>> >>>>> Hi, >>>>> I have a list (couple of thousands text lines) that I need to use in >>>>> my map function. I read this article about broadcasting variables >>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Dstable_dev_batch_-23broadcast-2Dvariables&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=dpWtkT5FJRWFqDA3MAnB4-dRYGDQjgfQTYAocqGkRKo&m=u5UQh821Gau2wZ7S3M8IRmVpL5JxGADJaq_k7iq6sYo&s=U3vGeHdL9fGDfP0GNZUkGpSlcVLz9CNLg2MXNwHP0_M&e=> >>>>> or >>>>> using distributed cache >>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Dstable_dev_batch_-23distributed-2Dcache&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=dpWtkT5FJRWFqDA3MAnB4-dRYGDQjgfQTYAocqGkRKo&m=u5UQh821Gau2wZ7S3M8IRmVpL5JxGADJaq_k7iq6sYo&s=m5IHbX1Dbz7AYERvVgyxKXmrUQQ06IkA4VCDllkR0HM&e=> >>>>> however I need to update this list from time to time, and if I understood >>>>> correctly it is not possible on broadcast or cache without restarting the >>>>> job. Is there idiomatic way to achieve this? A db seems to be an overkill >>>>> for that and I do want to be cheap on io/network calls as much as >>>>> possible. >>>>> >>>>> Cheers >>>>> Avi >>>>> >>>>>