Re: Continuous Query

Denis Magda Wed, 07 Oct 2020 10:00:24 -0700

I recalled a complete solution. That's what you would need to do if decide
to process records in real-time with continuous queries in the
*fault-tolerant fashion* (using pseudo-code rather than actual Ignite APIs).


First. You need to add a flag field to the record's class that keeps the
current processing status. Like that:

MyRecord {
int id;
Date created;
byte status; //0 - not processed, 1 - being processed withing a
continuous query filter, 2 - processed by the filter, all the logic
successfully completed
}

Second. The continuous query filter (that will be executed on nodes that
store a copy of a record) needs to have the following structure.

@IgniteAsyncCallback
filterMethod(MyRecords updatedRecord) {

if (isThisNodePrimaryNodeForTheRecord(updatedRecord)) { // execute on a
primary node only
    updatedRecord.status = 1 // setting the flag to signify the processing
is started.

    //execute your business logic

    updatedRecord.status = 2 // finished processing
}
  return false; //you don't want a notification to be sent to the client
application or another node that deployed the continuous query
}

Third. If any node leaves the cluster or the whole cluster is restarted,
then you need to execute your custom for all the records with status=0 or
status=1. To do that you can broadcast a compute task:

// Application side

int[] unprocessedRecords = "select id from MyRecord where status < 2;"

IgniteCompute.affinityRun(idsOfUnprocessedRecords, taskWithMyCustomLogic);
//the task will be executed only on the nodes that store the records

// Server side

taskWithMyCustomLogic() {
    updatedRecord.status = 1 // setting the flag to signify the processing
is started.

    //execute your business logic

    updatedRecord.status = 2 // finished processing
}


That's it. So, the third step already requires you to have a compute task
that would run the calculation in case of failures. Thus, if the real-time
aspect of the processing is not crucial right now, then you can start with
the batch-based approach by running a compute task once at a time and then
introduce the continuous queries-based improvement whenever is needed. You
decide. Hope it helps.


-
Denis


On Wed, Oct 7, 2020 at 9:19 AM Denis Magda <dma...@apache.org> wrote:

> So, a lesson for the future, would the continuous query approach still be
>> preferable if the calculation involves the cache with continuous query and
>> say a look up table? For example, if I want to see the country in the cache
>> employee exists in the list of the countries that I am interested in.
>
>
> You can access other caches from within the filter but the logic has to be
> executed asynchronously to avoid deadlocks:
> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/lang/IgniteAsyncCallback.html
>
> Also, what do I need to do if I want the filter for the continuous query
>> to execute on the cache on the local node only? Say, I have the continuous
>> query deployed as singleton service on each node, to capture certain
>> changes to a cache on the local node.
>
>
> <https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/lang/IgniteAsyncCallback.html>
>
> The filter will be deployed and executed on every server node. The filter
> is executed only on a server node that owns a record that is being modified
> and passed into a filter. Hmm, it's also said that the filter can be
> executed on a backup node. Check if it's true, and then you need to add a
> special check into the filter that would allow executing the logic only if
> it's the primary node:
>
> https://ignite.apache.org/docs/latest/key-value-api/continuous-queries#remote-filter
>
>
> -
> Denis
>
>
> On Wed, Oct 7, 2020 at 4:39 AM narges saleh <snarges...@gmail.com> wrote:
>
>> Also, what do I need to do if I want the filter for the continuous query
>> to execute on the cache on the local node only? Say, I have the continuous
>> query deployed as singleton service on each node, to capture certain
>> changes to a cache on the local node.
>>
>> On Wed, Oct 7, 2020 at 5:54 AM narges saleh <snarges...@gmail.com> wrote:
>>
>>> Thank you, Denis.
>>> So, a lesson for the future, would the continuous query approach still
>>> be preferable if the calculation involves the cache with continuous query
>>> and say a look up table? For example, if I want to see the country in the
>>> cache employee exists in the list of the countries that I am interested in.
>>>
>>> On Tue, Oct 6, 2020 at 4:11 PM Denis Magda <dma...@apache.org> wrote:
>>>
>>>> Thanks
>>>>
>>>> Then, I would consider the continuous queries based solution as long as
>>>> the records can be updated in real-time:
>>>>
>>>>    - You can process the records on the fly and don't need to come up
>>>>    with any batch task.
>>>>    - The continuous query filter will be executed once on a node that
>>>>    stores the record's primary copy. If the primary node fails in the 
>>>> middle
>>>>    of the filter's calculation execution, then the filter will be executed 
>>>> on
>>>>    a backup node. So, you will not lose any updates but might need to
>>>>    introduce some logic/flag that confirms the calculation is not executed
>>>>    twice for a single record (this can happen if the primary node failed in
>>>>    the middle of the calculation execution and then the backup node picked 
>>>> up
>>>>    and started executing the calculation from scratch).
>>>>    - Updates of other tables or records from within the continuous
>>>>    query filter must go through an async thread pool. You need to use
>>>>    IgniteAsyncCallback annotation for that:
>>>>    
>>>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/lang/IgniteAsyncCallback.html
>>>>
>>>> Alternatively, you can always run the calculation in the batch-fashion:
>>>>
>>>>    - Run a compute task once in a while
>>>>    - Read all the latest records that satisfy the requests with SQL or
>>>>    any other APIs
>>>>    - Complete the calculation, mark already processed records just in
>>>>    case if everything is failed in the middle and you need to run the
>>>>    calculation from scratch
>>>>
>>>>
>>>> -
>>>> Denis
>>>>
>>>>
>>>> On Mon, Oct 5, 2020 at 8:33 PM narges saleh <snarges...@gmail.com>
>>>> wrote:
>>>>
>>>>> Denis
>>>>>  The calculation itself doesn't involve an update or read of another
>>>>> record, but based on the outcome of the calculation, the process might 
>>>>> make
>>>>> changes in some other tables.
>>>>>
>>>>> thanks.
>>>>>
>>>>> On Mon, Oct 5, 2020 at 7:04 PM Denis Magda <dma...@apache.org> wrote:
>>>>>
>>>>>> Good. Another clarification:
>>>>>>
>>>>>>    - Does that calculation change the state of the record (updates
>>>>>>    any fields)?
>>>>>>    - Does the calculation read or update any other records?
>>>>>>
>>>>>> -
>>>>>> Denis
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 3, 2020 at 1:34 PM narges saleh <snarges...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> The latter; the server needs to perform some calculations on the
>>>>>>> data without sending any notification to the app.
>>>>>>>
>>>>>>> On Fri, Oct 2, 2020 at 4:25 PM Denis Magda <dma...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> And after you detect a record that satisfies the condition, do you
>>>>>>>> need to send any notification to the application? Or is it more like a
>>>>>>>> server detects and does some calculation logically without updating 
>>>>>>>> the app.
>>>>>>>>
>>>>>>>> -
>>>>>>>> Denis
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Oct 2, 2020 at 11:22 AM narges saleh <snarges...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The detection should happen at most a couple of minutes after a
>>>>>>>>> record is inserted in the cache but all the detections are local to 
>>>>>>>>> the
>>>>>>>>> node. But some records with the current timestamp might show up in the
>>>>>>>>> system with big delays.
>>>>>>>>>
>>>>>>>>> On Fri, Oct 2, 2020 at 12:23 PM Denis Magda <dma...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> What are your requirements? Do you need to process the records as
>>>>>>>>>> soon as they are put into the cluster?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Friday, October 2, 2020, narges saleh <snarges...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank you Dennis for the reply.
>>>>>>>>>>> From the perspective of performance/resource overhead and
>>>>>>>>>>> reliability, which approach is preferable? Does a continuous query 
>>>>>>>>>>> based
>>>>>>>>>>> approach impose a lot more overhead?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 2, 2020 at 9:52 AM Denis Magda <dma...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Narges,
>>>>>>>>>>>>
>>>>>>>>>>>> Use continuous queries if you need to be notified in real-time,
>>>>>>>>>>>> i.e. 1) a record is inserted, 2) the continuous filter confirms the
>>>>>>>>>>>> record's time satisfies your condition, 3) the continuous queries 
>>>>>>>>>>>> notifies
>>>>>>>>>>>> your application that does require processing.
>>>>>>>>>>>>
>>>>>>>>>>>> The jobs are better for a batching use case when it's ok to
>>>>>>>>>>>> process records together with some delay.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -
>>>>>>>>>>>> Denis
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Oct 2, 2020 at 3:50 AM narges saleh <
>>>>>>>>>>>> snarges...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>  If I want to watch for a rolling timestamp pattern in all the
>>>>>>>>>>>>> records that get inserted to all my caches, is it more efficient 
>>>>>>>>>>>>> to use
>>>>>>>>>>>>> timer based jobs (that checks all the records in some interval) or
>>>>>>>>>>>>> continuous queries that locally filter on the pattern? These 
>>>>>>>>>>>>> records can
>>>>>>>>>>>>> get inserted in any order  and some can arrive with delays.
>>>>>>>>>>>>> An example is to watch for all the records whose timestamp
>>>>>>>>>>>>> ends in 50, if the timestamp is in the format yyyy-mm-dd hh:mi.
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> -
>>>>>>>>>> Denis
>>>>>>>>>>
>>>>>>>>>>

Re: Continuous Query

Reply via email to