Re: [Qgis-developer] QgsVectorLayerCache

Sandro Mani Mon, 15 Jun 2015 07:57:13 -0700


(Forwarding message to list...)


On 15.06.2015 16:52, Sandro Mani wrote:

Hi Matthias

On 15.06.2015 15:34, Matthias Kuhn wrote:

Hi Sandro

On 06/15/2015 02:42 PM, Sandro Mani wrote:

Hello Matthias and List

I have two questions about the QgsVectorLayerCache, which you
(@Matthias) have implemented [1].

1. First the easy one:

https://github.com/qgis/QGIS/blame/master/src/core/qgsvectorlayercache.cpp#L97


-> I'm not sure what this block of code is supposed to do. As far as I
can see it just performs an empty iteration over all layer features,
but has no effects otherwise. Am I missing something? This block of
code is executed when loading the attribute table. If I comment it, I
can't spot any side effects, except that the loading of the attribute
table is faster.

It's a QgsCachedFeatureWriterIterator which fills the cache when
iterating over it.
It's only being invoked when full caching is requested to avoid
incremental population of the cache with a lot of subsequent requests.
If you have slow round-trips and disable this code the effect should be
noticeable. If it's not, there's something wrong with it.

Uhm ok, I'll need to investigate this together with my AFS providerimplementation, somehow the result of that code block was that allfeatures were fetched from the server twice.

2. Secondly, to the vector layer cache in general.
Some background: I've done an initial implementation of an ArcGIS
Feature Service ("AFS") data provider, and similarly to the WFS
provider, the question of intelligent caching arises, to reduce
round-trips with the server. The WFS provider just caches all features
in memory (if the corresponding option is checked), which is
suboptimal for large datasets.
I've hence been thinking about implementing a local disc-based cache
(say in the form of an SpatiaLite DB), which acts as a local feature
cache. The usefulness of this could however go beyond just WFS and
AFS, to include all non-local data sources. So my idea is to implement
something like a QgsVectorDataProviderProxy which

- overrides getFeatures to return a QgsCacheProxyFeatureIterator: this
iterator first checks whether the Feature is cached, and if not, only
then fetches it from the data provider. If the QgsFeatureRequest
includes an extent, entire pages of features could be loaded from the
disk to memory (up to a specified threshold).

- overrides all add/change/delete methods to ensure that the cache
remains consistent.

Actually I think the most elegant approach would be to have
QgsVectorLayer::mDataProvider be an instance of this
QgsVectorDataProviderProxy. If the data source is local, the calls are
simply forwarded to the actual data provider, otherwise, the above
outlined behavior applies.

So (@Matthias): such an implementation would pretty much overlap with
what you have implemented, but does the work directly at provider
level. What are your thoughts on this? From your experience
implementing [1], do any alarm bells start ringing?

I have thought about this approach as well as it seems to be very nice
to have one shared cache which is able to provide several consumers with
cached data (canvas, attribute table...). Do you think you will be
introducing a size limit?

Speaking of the disk-cache: Yes, I suppose that would make sense,perhaps as a configurable option in the user preferences. For memorycache, there clearly would be a size limit.


One risk I see is, that if you have different consumers (with a shared
cache), they have different requirements.
For the canvas the requirement is usually to have some spatial index
that keeps track of which regions are cached and if a new request can be
satisfied. It would be even easy/nice to do some look-ahead to pre-load
features or only load part of the canvas if a big region is already
loaded or do tiling.

Right, I'd like to model the cache around an idea of "pages", i.e.entire spatial regions which can be swapped in and out of memorydepending on the current region of interest.


If another consumer then does a second request without a spatial filter
(none or attribute filter instead) it may fetch a lot of features and
pollute your cache with these features. If there's a size limit of the
cache it can then be cleaned of previous features which would still be
more important for drawing then the ones fetched for a different
consumer which may have been requested just once.

Yes I see the problem. First, one would need to investigate howexpensive such cache trashings are compared to the situation with nocache at all. Then I suppose the usual ideas are things like having anaccess time stamp on the page loaded in memory, and if a page needs togo, the one last accessed furthest back will get thrown out.


You will also have to take care of multithreading since multiple
iterators can run at the same time.

Definitely.


It's probably also required to spend some thoughts on how to invalidate
the cache if the source data changes. (A time-based limit, a button to
trigger a reload...).

Perhaps it would generally be a good idea to have a clear user-facingentry in the layer context menu to re-sync the entire provider datawith the data source.


If this is implemented, it would surely be nice to have it not only for
AFS but also for other providers. Either way I would leave the choice to
the user if he wants to use it or not.

Sure, this would be a user-configurable option, which ideally wouldjust decide whether QgsVectorLayer::mDataProvider receives an actualprovider instance of a cache proxy instance.

If there's a request with a subsetOfAttributes set or without geometry,
it's important to know if the request is going to be changed before
sending it to the provider (so the cache contains all the information
but the request may take longer) or if the request is going to be sent,
requesting a reduced amount of information but not going to be cached.
Or if it's going to be cached with reduced information, but then it has
to be ensured later on that a subsequent request does not receive less
information than it requires.

I'd go with fetching the reduced feature from the data provider, andnot caching it, for a start at least. There are clearly more niftyapproaches to be explored later on ;)


I hope there are some good inputs in here

Yes definitely, thanks!

Sandro


_______________________________________________
Qgis-developer mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/qgis-developer

Re: [Qgis-developer] QgsVectorLayerCache

Reply via email to