Hi, I've done a QGIS demonstration last week at the local medical school, to show how easy is to get data and process it with QGIS. But I failed with a very simple aggregate expression. QGIS is not able to compute the expression "immediately". It takes one regular coffee to compute the expression (starting from cold water in the kettle).
The problem is easy to replicate. It just uses one polygon layer (225 countries) and a point layer (+-4000 points related to COVID cases reported). I've used the natural earth countries shapefile and COVID values from a csv file. These two layers can be retrieved with: wget https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip wget https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-04-2021.csv For the same country, like Germany or Italy, there are values for each state or province in the CSV file, as illustrated in [1]. I want to aggregate all values of the country. I've created a virtual field in country layer. I've used this expression to compute the total active cases for each country: aggregate(layer:='03-04-2021', aggregate:='sum', expression:="Active", filter:=contains( geometry(@parent), $geometry) ) It takes several minutes do evaluate and, if you have the attribute table opened, it will take the same amount again to fill the attribute table. If I try to use the aggregated field in symbology, it will try to evaluate it and it will be a problem. How can I improve the writing of such spatial expressions? My findings: 1) Using contains or intersect is the same. No difference in performance. 2) Using a shapefile instead of the origin CSV does not improve the performance 3) Adding spatial indexes (via 'Create spatial index' processing tool) to both shapefiles does not improve the performance Before you ask: a) The are topological errors in the Natural Earth source. Correcting them or just using the good polygons does not solve the problem. There is no difference in performance. b) For this specific COVID layer, a simple attribute join could be used. The point here is to understand how this spatial expression can be improved in terms of performance. c) This can be done in Postgresql with: select wf.sovereignt, sum(cases."Active") from world wf, "03-04-2021" cases where st_contains(wf.geom, cases.geom) group by wf.sovereignt; and it is computed "instantaneously". I know that too, but the point is to improve QGIS. Regards, Jorge Gustavo [1] https://nextcloud.geomaster.pt/index.php/s/ZNR87PHBrBJjmmC [2] https://nextcloud.geomaster.pt/index.php/s/bxyXTfN3J4moKHr -- Jorge Gustavo Rocha Departamento de Informática Universidade do Minho 4710-057 Braga Gabinete 3.29 (Piso 3) Tel: +351 253604480 Fax: +351 253604471 Móvel: +351 910333888 skype: nabocudnosor _______________________________________________ QGIS-Developer mailing list [email protected] List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
