Re: data enrichment with SQL use case

Alexander Smirnov Mon, 23 Apr 2018 04:30:11 -0700

Hi Fabian,

please share the workarounds, that must be helpful for my case as well


Thank you,
Alex

On Mon, Apr 23, 2018 at 2:14 PM Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Miki,
>
> Sorry for the late response.
> There are basically two ways to implement an enrichment join as in your
> use case.
>
> 1) Keep the meta data in the database and implement a job that reads the
> stream from Kafka and queries the database in an ASyncIO operator for every
> stream record. This should be the easier implementation but it will send
> one query to the DB for each streamed record.
> 2) Replicate the meta data into Flink state and join the streamed records
> with the state. This solution is more complex because you need propagate
> updates of the meta data (if there are any) into the Flink state. At the
> moment, Flink lacks a few features to have a good implementation of this
> approach, but there a some workarounds that help in certain cases.
>
> Note that Flink's SQL support does not add advantages for the either of
> both approaches. You should use the DataStream API (and possible
> ProcessFunctions).
>
> I'd go for the first approach if one query per record is feasible.
> Let me know if you need to tackle the second approach and I can give some
> details on the workarounds I mentioned.
>
> Best, Fabian
>
> 2018-04-16 20:38 GMT+02:00 Ken Krugler <kkrugler_li...@transpac.com>:
>
>> Hi Miki,
>>
>> I haven’t tried mixing AsyncFunctions with SQL queries.
>>
>> Normally I’d create a regular DataStream workflow that first reads from
>> Kafka, then has an AsyncFunction to read from the SQL database.
>>
>> If there are often duplicate keys in the Kafka-based stream, you could
>> keyBy(key) before the AsyncFunction, and then cache the result of the SQL
>> query.
>>
>> — Ken
>>
>> On Apr 16, 2018, at 11:19 AM, miki haiat <miko5...@gmail.com> wrote:
>>
>> HI thanks  for the reply  i will try to break your reply to the flow
>> execution order .
>>
>> First data stream Will use AsyncIO and select the table ,
>> Second stream will be kafka and the i can join the stream and map it ?
>>
>> If that   the case  then i will select the table only once on load ?
>> How can i make sure that my stream table is "fresh" .
>>
>> Im thinking to myself , is thire a way to use flink backend (ROKSDB)  and
>> create read/write through
>> macanisem ?
>>
>> Thanks
>>
>> miki
>>
>>
>>
>> On Mon, Apr 16, 2018 at 2:45 AM, Ken Krugler <kkrugler_li...@transpac.com
>> > wrote:
>>
>>> If the SQL data is all (or mostly all) needed to join against the data
>>> from Kafka, then I might try a regular join.
>>>
>>> Otherwise it sounds like you want to use an AsyncFunction to do ad hoc
>>> queries (in parallel) against your SQL DB.
>>>
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/asyncio.html
>>>
>>> — Ken
>>>
>>>
>>> On Apr 15, 2018, at 12:15 PM, miki haiat <miko5...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I have a case of meta data enrichment and im wondering if my approach is
>>> the correct way .
>>>
>>>    1. input stream from kafka.
>>>    2. MD in msSQL .
>>>    3. map to new pojo
>>>
>>> I need to extract  a key from the kafka stream   and use it to select
>>> some values from the sql table  .
>>>
>>> SO i thought  to use  the table SQL api in order to select the table MD
>>> then convert the kafka stream to table and join the data by  the stream
>>> key .
>>>
>>> At the end i need to map the joined data to a new POJO and send it to
>>> elesticserch .
>>>
>>> Any suggestions or different ways to solve this use case ?
>>>
>>> thanks,
>>> Miki
>>>
>>>
>>>
>>>
>>> --------------------------
>>> Ken Krugler
>>> http://www.scaleunlimited.com
>>> custom big data solutions & training
>>> Hadoop, Cascading, Cassandra & Solr
>>>
>>>
>>
>> --------------------------------------------
>> http://about.me/kkrugler
>> +1 530-210-6378 <(530)%20210-6378>
>>
>>
>

Re: data enrichment with SQL use case

Reply via email to