Re: cache size() calculation for MVCC
Yes. On Wed, Apr 25, 2018 at 10:28 PM, Dmitriy Setrakyanwrote: > On Wed, Apr 25, 2018 at 5:45 PM, Vladimir Ozerov > wrote: > > > Yakov, > > > > Thread-per-partition is hardly applicable for general SQL use case as > user > > operates on arbitrary data sets. But in general we may track size deltas > > for partitions on transaction level. If transaction span one or several > > partitions, we may hold this data in a single long or Map. If transaction > > spans a lot of partitions, we may store this data in array. > > > > What do you think? > > > > Vova, I think you are suggesting maintaining the size as you go. This does > sound like a right approach, assuming that it does not carry noticeable > overhead. > > D. >
Re: cache size() calculation for MVCC
On Wed, Apr 25, 2018 at 5:45 PM, Vladimir Ozerovwrote: > Yakov, > > Thread-per-partition is hardly applicable for general SQL use case as user > operates on arbitrary data sets. But in general we may track size deltas > for partitions on transaction level. If transaction span one or several > partitions, we may hold this data in a single long or Map. If transaction > spans a lot of partitions, we may store this data in array. > > What do you think? > Vova, I think you are suggesting maintaining the size as you go. This does sound like a right approach, assuming that it does not carry noticeable overhead. D.
Re: cache size() calculation for MVCC
Vova, what about maintaining the size as you go? This way you don't have to count multiple versions, no? D. On Wed, Apr 25, 2018, 5:07 PM Vladimir Ozerovwrote: > This is interesting question. Full-scan size may be tremendously slow > operation on large data sets. On the other hand, printing total number of > tuples including old and aborted versions make little to no sense as well. > Looks like we need to choose lesser of two evils. What if we do the > following: > 1) Left default behavior as is - O(1) complexity, but includes invalid > versions > 2) As Sergey suggested, add new peek mode "MVCC_ALIVE_ONLY" which will > perform full scan. > > Alternatively we may throw an "UnsupportedOperationException" from this > method - why not? > > Thoughts? > > On Tue, Apr 24, 2018 at 4:28 PM, Sergey Kalashnikov > > wrote: > > > Hi Igniters, > > > > I need your advice on a task at hand. > > > > Currently cache API size() is a constant time operation, since the > > number of entries is maintained as a separate counter. > > However, for MVCC-enabled cache there can be multiple versions of the > > same entry. > > In order to calculate the size we need to obtain a MVCC snapshot and > > then iterate over data pages filtering invisible versions. > > So, it is impossible to keep the same complexity guarantees. > > > > My current implementation internally switches to "full-scan" approach > > if cache in question is a MVCC-enabled cache. > > It happens unbeknown to users, which may expect lightning-fast > > response as before. > > Perhaps, we might add a new constant to CachePeekMode enumeration that > > is passed to cache size() to make it explicit? > > > > The second concern is that cache size calculation is also included > > into Cache Metrics API and Visor functionality. > > Will it be OK for metrics and things alike to keep returning raw > > unfiltered number of entries? > > Is there any sense in showing raw unfiltered number of entries which > > may vary greatly from invokation to invokation with just simple > > updates running in background? > > > > Please share your thoughts. > > > > Thanks in advance. > > -- > > Sergey > > >
Re: cache size() calculation for MVCC
Guys, The cache size counter is actually a set of per-partition counters. 2018-04-25 12:45 GMT+03:00 Vladimir Ozerov: > Yakov, > > Thread-per-partition is hardly applicable for general SQL use case as user > operates on arbitrary data sets. But in general we may track size deltas for > partitions on transaction level. If transaction span one or several > partitions, we may hold this data in a single long or Map. If transaction > spans a lot of partitions, we may store this data in array. > > What do you think? > > On Wed, Apr 25, 2018 at 12:18 PM, Yakov Zhdanov wrote: >> >> Guys, >> >> How do we update counter right now? >> >> If we move to fair thread-per-partition we can update counter only if we >> add new key and skip if we add or remove a version. Thoughts? >> >> --Yakov >> >> 2018-04-25 12:07 GMT+03:00 Vladimir Ozerov : >> >> > This is interesting question. Full-scan size may be tremendously slow >> > operation on large data sets. On the other hand, printing total number >> > of >> > tuples including old and aborted versions make little to no sense as >> > well. >> > Looks like we need to choose lesser of two evils. What if we do the >> > following: >> > 1) Left default behavior as is - O(1) complexity, but includes invalid >> > versions >> > 2) As Sergey suggested, add new peek mode "MVCC_ALIVE_ONLY" which will >> > perform full scan. >> > >> > Alternatively we may throw an "UnsupportedOperationException" from this >> > method - why not? >> > >> > Thoughts? >> > >> > On Tue, Apr 24, 2018 at 4:28 PM, Sergey Kalashnikov >> > > > > >> > wrote: >> > >> > > Hi Igniters, >> > > >> > > I need your advice on a task at hand. >> > > >> > > Currently cache API size() is a constant time operation, since the >> > > number of entries is maintained as a separate counter. >> > > However, for MVCC-enabled cache there can be multiple versions of the >> > > same entry. >> > > In order to calculate the size we need to obtain a MVCC snapshot and >> > > then iterate over data pages filtering invisible versions. >> > > So, it is impossible to keep the same complexity guarantees. >> > > >> > > My current implementation internally switches to "full-scan" approach >> > > if cache in question is a MVCC-enabled cache. >> > > It happens unbeknown to users, which may expect lightning-fast >> > > response as before. >> > > Perhaps, we might add a new constant to CachePeekMode enumeration that >> > > is passed to cache size() to make it explicit? >> > > >> > > The second concern is that cache size calculation is also included >> > > into Cache Metrics API and Visor functionality. >> > > Will it be OK for metrics and things alike to keep returning raw >> > > unfiltered number of entries? >> > > Is there any sense in showing raw unfiltered number of entries which >> > > may vary greatly from invokation to invokation with just simple >> > > updates running in background? >> > > >> > > Please share your thoughts. >> > > >> > > Thanks in advance. >> > > -- >> > > Sergey >> > > >> > > >
Re: cache size() calculation for MVCC
Yakov, Thread-per-partition is hardly applicable for general SQL use case as user operates on arbitrary data sets. But in general we may track size deltas for partitions on transaction level. If transaction span one or several partitions, we may hold this data in a single long or Map. If transaction spans a lot of partitions, we may store this data in array. What do you think? On Wed, Apr 25, 2018 at 12:18 PM, Yakov Zhdanovwrote: > Guys, > > How do we update counter right now? > > If we move to fair thread-per-partition we can update counter only if we > add new key and skip if we add or remove a version. Thoughts? > > --Yakov > > 2018-04-25 12:07 GMT+03:00 Vladimir Ozerov : > > > This is interesting question. Full-scan size may be tremendously slow > > operation on large data sets. On the other hand, printing total number of > > tuples including old and aborted versions make little to no sense as > well. > > Looks like we need to choose lesser of two evils. What if we do the > > following: > > 1) Left default behavior as is - O(1) complexity, but includes invalid > > versions > > 2) As Sergey suggested, add new peek mode "MVCC_ALIVE_ONLY" which will > > perform full scan. > > > > Alternatively we may throw an "UnsupportedOperationException" from this > > method - why not? > > > > Thoughts? > > > > On Tue, Apr 24, 2018 at 4:28 PM, Sergey Kalashnikov < > zkilling...@gmail.com > > > > > wrote: > > > > > Hi Igniters, > > > > > > I need your advice on a task at hand. > > > > > > Currently cache API size() is a constant time operation, since the > > > number of entries is maintained as a separate counter. > > > However, for MVCC-enabled cache there can be multiple versions of the > > > same entry. > > > In order to calculate the size we need to obtain a MVCC snapshot and > > > then iterate over data pages filtering invisible versions. > > > So, it is impossible to keep the same complexity guarantees. > > > > > > My current implementation internally switches to "full-scan" approach > > > if cache in question is a MVCC-enabled cache. > > > It happens unbeknown to users, which may expect lightning-fast > > > response as before. > > > Perhaps, we might add a new constant to CachePeekMode enumeration that > > > is passed to cache size() to make it explicit? > > > > > > The second concern is that cache size calculation is also included > > > into Cache Metrics API and Visor functionality. > > > Will it be OK for metrics and things alike to keep returning raw > > > unfiltered number of entries? > > > Is there any sense in showing raw unfiltered number of entries which > > > may vary greatly from invokation to invokation with just simple > > > updates running in background? > > > > > > Please share your thoughts. > > > > > > Thanks in advance. > > > -- > > > Sergey > > > > > >
Re: cache size() calculation for MVCC
Guys, How do we update counter right now? If we move to fair thread-per-partition we can update counter only if we add new key and skip if we add or remove a version. Thoughts? --Yakov 2018-04-25 12:07 GMT+03:00 Vladimir Ozerov: > This is interesting question. Full-scan size may be tremendously slow > operation on large data sets. On the other hand, printing total number of > tuples including old and aborted versions make little to no sense as well. > Looks like we need to choose lesser of two evils. What if we do the > following: > 1) Left default behavior as is - O(1) complexity, but includes invalid > versions > 2) As Sergey suggested, add new peek mode "MVCC_ALIVE_ONLY" which will > perform full scan. > > Alternatively we may throw an "UnsupportedOperationException" from this > method - why not? > > Thoughts? > > On Tue, Apr 24, 2018 at 4:28 PM, Sergey Kalashnikov > > wrote: > > > Hi Igniters, > > > > I need your advice on a task at hand. > > > > Currently cache API size() is a constant time operation, since the > > number of entries is maintained as a separate counter. > > However, for MVCC-enabled cache there can be multiple versions of the > > same entry. > > In order to calculate the size we need to obtain a MVCC snapshot and > > then iterate over data pages filtering invisible versions. > > So, it is impossible to keep the same complexity guarantees. > > > > My current implementation internally switches to "full-scan" approach > > if cache in question is a MVCC-enabled cache. > > It happens unbeknown to users, which may expect lightning-fast > > response as before. > > Perhaps, we might add a new constant to CachePeekMode enumeration that > > is passed to cache size() to make it explicit? > > > > The second concern is that cache size calculation is also included > > into Cache Metrics API and Visor functionality. > > Will it be OK for metrics and things alike to keep returning raw > > unfiltered number of entries? > > Is there any sense in showing raw unfiltered number of entries which > > may vary greatly from invokation to invokation with just simple > > updates running in background? > > > > Please share your thoughts. > > > > Thanks in advance. > > -- > > Sergey > > >
Re: cache size() calculation for MVCC
This is interesting question. Full-scan size may be tremendously slow operation on large data sets. On the other hand, printing total number of tuples including old and aborted versions make little to no sense as well. Looks like we need to choose lesser of two evils. What if we do the following: 1) Left default behavior as is - O(1) complexity, but includes invalid versions 2) As Sergey suggested, add new peek mode "MVCC_ALIVE_ONLY" which will perform full scan. Alternatively we may throw an "UnsupportedOperationException" from this method - why not? Thoughts? On Tue, Apr 24, 2018 at 4:28 PM, Sergey Kalashnikovwrote: > Hi Igniters, > > I need your advice on a task at hand. > > Currently cache API size() is a constant time operation, since the > number of entries is maintained as a separate counter. > However, for MVCC-enabled cache there can be multiple versions of the > same entry. > In order to calculate the size we need to obtain a MVCC snapshot and > then iterate over data pages filtering invisible versions. > So, it is impossible to keep the same complexity guarantees. > > My current implementation internally switches to "full-scan" approach > if cache in question is a MVCC-enabled cache. > It happens unbeknown to users, which may expect lightning-fast > response as before. > Perhaps, we might add a new constant to CachePeekMode enumeration that > is passed to cache size() to make it explicit? > > The second concern is that cache size calculation is also included > into Cache Metrics API and Visor functionality. > Will it be OK for metrics and things alike to keep returning raw > unfiltered number of entries? > Is there any sense in showing raw unfiltered number of entries which > may vary greatly from invokation to invokation with just simple > updates running in background? > > Please share your thoughts. > > Thanks in advance. > -- > Sergey >
cache size() calculation for MVCC
Hi Igniters, I need your advice on a task at hand. Currently cache API size() is a constant time operation, since the number of entries is maintained as a separate counter. However, for MVCC-enabled cache there can be multiple versions of the same entry. In order to calculate the size we need to obtain a MVCC snapshot and then iterate over data pages filtering invisible versions. So, it is impossible to keep the same complexity guarantees. My current implementation internally switches to "full-scan" approach if cache in question is a MVCC-enabled cache. It happens unbeknown to users, which may expect lightning-fast response as before. Perhaps, we might add a new constant to CachePeekMode enumeration that is passed to cache size() to make it explicit? The second concern is that cache size calculation is also included into Cache Metrics API and Visor functionality. Will it be OK for metrics and things alike to keep returning raw unfiltered number of entries? Is there any sense in showing raw unfiltered number of entries which may vary greatly from invokation to invokation with just simple updates running in background? Please share your thoughts. Thanks in advance. -- Sergey