Thanks Raymond for the pointer. On Wed, Oct 7, 2020 at 4:39 PM Raymond Wilson <raymond_wil...@trimble.com> wrote:
> It's possible to configure the continuous query to return only keys in the > cache that are stored on the local node. > > On the C# client we do it like this: > > var _queryHandle = queueCache.QueryContinuous > (qry: new ContinuousQuery<[..KeyType..], > [...ValueType...]>(listener) {Local = true}, > initialQry: new ScanQuery< [..KeyType..], [...ValueType...] > > {Local = true}); > > On Thu, Oct 8, 2020 at 9:53 AM narges saleh <snarges...@gmail.com> wrote: > >> Thanks for the detailed explanation. >> >> I don't know how efficient it would be if you have to filter each record >> one by one and then update each record, three times, to keep track of the >> status, if you're dealing with millions of records each hour, even if the >> cache is partitioned. I guess I will need to benchmark this. thanks again. >> >> On Wed, Oct 7, 2020 at 12:00 PM Denis Magda <dma...@apache.org> wrote: >> >>> I recalled a complete solution. That's what you would need to do if >>> decide to process records in real-time with continuous queries in the >>> *fault-tolerant fashion* (using pseudo-code rather than actual Ignite APIs). >>> >>> First. You need to add a flag field to the record's class that keeps the >>> current processing status. Like that: >>> >>> MyRecord { >>> int id; >>> Date created; >>> byte status; //0 - not processed, 1 - being processed withing a >>> continuous query filter, 2 - processed by the filter, all the logic >>> successfully completed >>> } >>> >>> Second. The continuous query filter (that will be executed on nodes that >>> store a copy of a record) needs to have the following structure. >>> >>> @IgniteAsyncCallback >>> filterMethod(MyRecords updatedRecord) { >>> >>> if (isThisNodePrimaryNodeForTheRecord(updatedRecord)) { // execute on a >>> primary node only >>> updatedRecord.status = 1 // setting the flag to signify the >>> processing is started. >>> >>> //execute your business logic >>> >>> updatedRecord.status = 2 // finished processing >>> } >>> return false; //you don't want a notification to be sent to the client >>> application or another node that deployed the continuous query >>> } >>> >>> Third. If any node leaves the cluster or the whole cluster is restarted, >>> then you need to execute your custom for all the records with status=0 or >>> status=1. To do that you can broadcast a compute task: >>> >>> // Application side >>> >>> int[] unprocessedRecords = "select id from MyRecord where status < 2;" >>> >>> IgniteCompute.affinityRun(idsOfUnprocessedRecords, >>> taskWithMyCustomLogic); //the task will be executed only on the nodes that >>> store the records >>> >>> // Server side >>> >>> taskWithMyCustomLogic() { >>> updatedRecord.status = 1 // setting the flag to signify the >>> processing is started. >>> >>> //execute your business logic >>> >>> updatedRecord.status = 2 // finished processing >>> } >>> >>> >>> That's it. So, the third step already requires you to have a compute >>> task that would run the calculation in case of failures. Thus, if the >>> real-time aspect of the processing is not crucial right now, then you can >>> start with the batch-based approach by running a compute task once at a >>> time and then introduce the continuous queries-based improvement whenever >>> is needed. You decide. Hope it helps. >>> >>> >>> - >>> Denis >>> >>> >>> On Wed, Oct 7, 2020 at 9:19 AM Denis Magda <dma...@apache.org> wrote: >>> >>>> So, a lesson for the future, would the continuous query approach still >>>>> be preferable if the calculation involves the cache with continuous query >>>>> and say a look up table? For example, if I want to see the country in the >>>>> cache employee exists in the list of the countries that I am interested >>>>> in. >>>> >>>> >>>> You can access other caches from within the filter but the logic has to >>>> be executed asynchronously to avoid deadlocks: >>>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/lang/IgniteAsyncCallback.html >>>> >>>> Also, what do I need to do if I want the filter for the continuous >>>>> query to execute on the cache on the local node only? Say, I have the >>>>> continuous query deployed as singleton service on each node, to capture >>>>> certain changes to a cache on the local node. >>>> >>>> >>>> <https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/lang/IgniteAsyncCallback.html> >>>> >>>> The filter will be deployed and executed on every server node. The >>>> filter is executed only on a server node that owns a record that is being >>>> modified and passed into a filter. Hmm, it's also said that the filter can >>>> be executed on a backup node. Check if it's true, and then you need to add >>>> a special check into the filter that would allow executing the logic only >>>> if it's the primary node: >>>> >>>> https://ignite.apache.org/docs/latest/key-value-api/continuous-queries#remote-filter >>>> >>>> >>>> - >>>> Denis >>>> >>>> >>>> On Wed, Oct 7, 2020 at 4:39 AM narges saleh <snarges...@gmail.com> >>>> wrote: >>>> >>>>> Also, what do I need to do if I want the filter for the continuous >>>>> query to execute on the cache on the local node only? Say, I have the >>>>> continuous query deployed as singleton service on each node, to capture >>>>> certain changes to a cache on the local node. >>>>> >>>>> On Wed, Oct 7, 2020 at 5:54 AM narges saleh <snarges...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thank you, Denis. >>>>>> So, a lesson for the future, would the continuous query approach >>>>>> still be preferable if the calculation involves the cache with continuous >>>>>> query and say a look up table? For example, if I want to see the country >>>>>> in >>>>>> the cache employee exists in the list of the countries that I am >>>>>> interested >>>>>> in. >>>>>> >>>>>> On Tue, Oct 6, 2020 at 4:11 PM Denis Magda <dma...@apache.org> wrote: >>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Then, I would consider the continuous queries based solution as long >>>>>>> as the records can be updated in real-time: >>>>>>> >>>>>>> - You can process the records on the fly and don't need to come >>>>>>> up with any batch task. >>>>>>> - The continuous query filter will be executed once on a node >>>>>>> that stores the record's primary copy. If the primary node fails in >>>>>>> the >>>>>>> middle of the filter's calculation execution, then the filter will be >>>>>>> executed on a backup node. So, you will not lose any updates but >>>>>>> might need >>>>>>> to introduce some logic/flag that confirms the calculation is not >>>>>>> executed >>>>>>> twice for a single record (this can happen if the primary node >>>>>>> failed in >>>>>>> the middle of the calculation execution and then the backup node >>>>>>> picked up >>>>>>> and started executing the calculation from scratch). >>>>>>> - Updates of other tables or records from within the continuous >>>>>>> query filter must go through an async thread pool. You need to use >>>>>>> IgniteAsyncCallback annotation for that: >>>>>>> >>>>>>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/lang/IgniteAsyncCallback.html >>>>>>> >>>>>>> Alternatively, you can always run the calculation in the >>>>>>> batch-fashion: >>>>>>> >>>>>>> - Run a compute task once in a while >>>>>>> - Read all the latest records that satisfy the requests with SQL >>>>>>> or any other APIs >>>>>>> - Complete the calculation, mark already processed records just >>>>>>> in case if everything is failed in the middle and you need to run the >>>>>>> calculation from scratch >>>>>>> >>>>>>> >>>>>>> - >>>>>>> Denis >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 5, 2020 at 8:33 PM narges saleh <snarges...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Denis >>>>>>>> The calculation itself doesn't involve an update or read of >>>>>>>> another record, but based on the outcome of the calculation, the >>>>>>>> process >>>>>>>> might make changes in some other tables. >>>>>>>> >>>>>>>> thanks. >>>>>>>> >>>>>>>> On Mon, Oct 5, 2020 at 7:04 PM Denis Magda <dma...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Good. Another clarification: >>>>>>>>> >>>>>>>>> - Does that calculation change the state of the record >>>>>>>>> (updates any fields)? >>>>>>>>> - Does the calculation read or update any other records? >>>>>>>>> >>>>>>>>> - >>>>>>>>> Denis >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Oct 3, 2020 at 1:34 PM narges saleh <snarges...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> The latter; the server needs to perform some calculations on the >>>>>>>>>> data without sending any notification to the app. >>>>>>>>>> >>>>>>>>>> On Fri, Oct 2, 2020 at 4:25 PM Denis Magda <dma...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> And after you detect a record that satisfies the condition, do >>>>>>>>>>> you need to send any notification to the application? Or is it more >>>>>>>>>>> like a >>>>>>>>>>> server detects and does some calculation logically without updating >>>>>>>>>>> the app. >>>>>>>>>>> >>>>>>>>>>> - >>>>>>>>>>> Denis >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Oct 2, 2020 at 11:22 AM narges saleh < >>>>>>>>>>> snarges...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> The detection should happen at most a couple of minutes after a >>>>>>>>>>>> record is inserted in the cache but all the detections are local >>>>>>>>>>>> to the >>>>>>>>>>>> node. But some records with the current timestamp might show up in >>>>>>>>>>>> the >>>>>>>>>>>> system with big delays. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Oct 2, 2020 at 12:23 PM Denis Magda <dma...@apache.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> What are your requirements? Do you need to process the records >>>>>>>>>>>>> as soon as they are put into the cluster? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Friday, October 2, 2020, narges saleh <snarges...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you Dennis for the reply. >>>>>>>>>>>>>> From the perspective of performance/resource overhead and >>>>>>>>>>>>>> reliability, which approach is preferable? Does a continuous >>>>>>>>>>>>>> query based >>>>>>>>>>>>>> approach impose a lot more overhead? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Oct 2, 2020 at 9:52 AM Denis Magda <dma...@apache.org> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Narges, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Use continuous queries if you need to be notified in >>>>>>>>>>>>>>> real-time, i.e. 1) a record is inserted, 2) the continuous >>>>>>>>>>>>>>> filter confirms >>>>>>>>>>>>>>> the record's time satisfies your condition, 3) the continuous >>>>>>>>>>>>>>> queries >>>>>>>>>>>>>>> notifies your application that does require processing. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The jobs are better for a batching use case when it's ok to >>>>>>>>>>>>>>> process records together with some delay. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - >>>>>>>>>>>>>>> Denis >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Oct 2, 2020 at 3:50 AM narges saleh < >>>>>>>>>>>>>>> snarges...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>> If I want to watch for a rolling timestamp pattern in all >>>>>>>>>>>>>>>> the records that get inserted to all my caches, is it more >>>>>>>>>>>>>>>> efficient to use >>>>>>>>>>>>>>>> timer based jobs (that checks all the records in some >>>>>>>>>>>>>>>> interval) or >>>>>>>>>>>>>>>> continuous queries that locally filter on the pattern? These >>>>>>>>>>>>>>>> records can >>>>>>>>>>>>>>>> get inserted in any order and some can arrive with delays. >>>>>>>>>>>>>>>> An example is to watch for all the records whose timestamp >>>>>>>>>>>>>>>> ends in 50, if the timestamp is in the format yyyy-mm-dd hh:mi. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> thanks >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> - >>>>>>>>>>>>> Denis >>>>>>>>>>>>> >>>>>>>>>>>>> > > -- > <http://www.trimble.com/> > Raymond Wilson > Solution Architect, Civil Construction Software Systems (CCSS) > 11 Birmingham Drive | Christchurch, New Zealand > +64-21-2013317 Mobile > raymond_wil...@trimble.com > > > <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >