Hey Stanislav, I was able to get into a situation like described before but the server nodes are not under pressure any more. The cache state seems to stay corrupted. I was able to restart the client and even then the client does not see the whole cache content (but with the sql query) and gets "not found" errors when trying to load objects it wants to load because the keys were spit out by the sql query. I am able to reproduce the situation, if the load is applied to the servers while one of them is inserting new objects with LoadCache method. I switched to REPLICATED mode in the meantime and the problem still exists.
Current situation: I have 138285 objects (SQLLine "SELECT COUNT(*) FROM [cache]") which corresponds to the object count read from the database to fill the cache. The SQL query being executed on the client is "select count(*) FROM [cache] WHERE Field1 IN ('d1719313-4921-43c5-82b7-4924a6c3e6e3','bff55ebc-b353-4193-a827-703cd32b45f0','fbf4fea6-11f5-41bc-8018-e230904574b1') AND Field2 IN (3,1,4,7);" This query gives constantly 120284 hits in SQLLine. On the client node I get the same result - at least most of the time. Sometimes the client only gets 3753 as result at the moment. The client does the query every 10 to 15 seconds. Now about the numbers that visor (and `cache.GetSize()` on the client) reports: 72935 (Split up into 1866 on the first and 71069 on the second node). Therefore I've 65350 objects that are seen by SQL (everytime using SQLLine, most of the times seen by the client) and that are not seen by visor / by `cache.GetSize()` executed on the client. I will extract the configuration and some log data as soon as possible. I'll also try to replace LoadCache with a DataStreamer. Maybe it helps. Cheers, Dome -----Ursprüngliche Nachricht----- Von: Stanislav Lukyanov <stanlukya...@gmail.com> Gesendet: 13 April 2018 18:26 An: firstname.lastname@example.org Betreff: Re: Discrepancy between cache.Size() / Visorcmd and SQL query resultset Hi, Could you please share your configuration files, cache configurations, logs from all nodes and the code snippets you use to do the queries (visor commands, SQL, etc)? Thanks, Stan Bellenger, Dominique wrote > Hey Igniters, > > I've the following setup (Ignite .NET 2.4): 2 Server nodes, 1 client > node doing SQL-queries on the cache periodically (every 20 Seconds in my > case). > The cache is filled with 110_000 entries from a database, using > "LoadCache" method. Key is a string representation of a number, > nothing fancy here. > > Situation: Both server nodes are put under pressure by doing > affinity-run compute jobs on both nodes, affecting all cache entries > (read, change, put every entry). > > I made the following observations: > > 1. Visorcmd showed that the entries were distributed like 60_000 on > one node and 34_000 on the other. The same sum (94_000) was shown on > the client side on every periodic "tick" when calling "GetSize" on the > cache instance > (https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core/Cache/ICache.cs#L685). > > > * Why are there entries missing? Running SELECT Count(*) on the > Cache with SQLLine reports back 110_000 entries. > * Why are the entries not distributed 50/50 (or nearly 50/50)? > > > 1. On the client, the SQL query invoked on every "tick" returned > sometimes 110_000 entries, sometimes 60_000 or 34_000. There was no > error or warning in the client or server log about failing SQL queries. > * In a partitioned cache both servers do a query and the results > are merged, if I understood correctly. It seems to me that one of the > servers sometimes returns an empty result set and therefore the client > gets a too small result set. Question is: why does this happen even > without a warning on the server nodes about a failing query? > 2. In that situation the client is not able to load a specific > entry from the cache multiple times using TryGet(TK key, out TV value) > (https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core/Cache/ICache.cs#L297). > Those entries definitely are existing in the cache. > 3. In that situation on one of both server nodes I get errors that > an entry could not be loaded (like in 3) but on the affinity-server node!). > In my understanding the compute jobs shall get executed on the primary > node for the given key. And this node is not able to load an entry by > that key (when under heavy CPU pressure)? > > Something is strange here. Any ideas? > > Cheers, > Dome -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/