I’d like to carry this conversation further, cross-posting to dev list:
I now have possible production use cases for accessing cache key [metadata]. As an example, suppose I want to scan all keys from a cache that may contain large amounts of data and perform some operation on a few of them, based on the value of the key itself. In this use-case the IO bandwidth required for keys & data might be as much as a 1000 times the bandwidth required for keys alone, even when considering request parallelization and co-location. I imagine that Ignite can internally scan cache keys as a part of its internal query operations. Is that correct? If so, would it be difficult to expose this kind of functionality in the Ignite API? Thanks, Raymond. *From:* Raymond Wilson [mailto:raymond_wil...@trimble.com] *Sent:* Monday, December 4, 2017 11:26 PM *To:* 'u...@ignite.apache.org' <u...@ignite.apache.org> *Subject:* RE: Obtaining metadata about items in the cache Thanks Alexey. This would certainly reduce the IO, but does still require all the data to be read. My use case is not really a production one: I want to iterate all items in the cache to determine if the page size for persistency was suitable. Reading all the data is not too painful, but a meta data scan would be much faster, especially if spread across the cluster in your example below. Raymond. *From:* Alexey Kukushkin [mailto:kukushkinale...@gmail.com <kukushkinale...@gmail.com>] *Sent:* Monday, December 4, 2017 11:10 PM *To:* u...@ignite.apache.org *Subject:* Re: Obtaining metadata about items in the cache Hi Raymond, I do not think Ignite supports iterating other metadata but you could minimise IO by: - collocated processing (analyse entries locally without sending them over the network) - working with binary object representation directly (without serialisation/deserialisation) You could send you analysis job to each partition and then execute a local scan query that would work with binary objects. In the below code I highlighted the affinityCall, withKeepBinary and setLocal methods you need to use to achieve the above optimizations: IgniteCompute compute = ignite.compute(ignite.cluster().forServers()); for (int i = 0; i < ignite.affinity("CacheName").partitions(); ++i) { compute.*affinityRun*(Collections.singletonList("CacheName"), i, () -> { IgniteCache<BinaryObject, BinaryObject> cache = ignite.cache("CacheName").*withKeepBinary*(); IgniteQuery<...> qry = new ScanQuery<>( (k, v) -> { ... }; qry.*setLocal*(true); QueryCursor<Cache.Entry<BO, BO> cur = cache.query( ); ... }); } On Mon, Dec 4, 2017 at 1:33 AM, Raymond Wilson <raymond_wil...@trimble.com> wrote: Hi, I’d like to be able to scan all the items in a cache where all I am interested in is the cache key and other metadata about the cached item (such as its size). I can do this now by running a cache query that simple reads out all the cache items, but this is a lot of IO when I don’t care about the content of the items themselves. Does anyone here do this? Thanks, Raymond. -- Best regards, Alexey