Hi list, (Linus Torvalds-style harsh truths incoming, read only after coffee/alcohol!)
Having spent an incredibly frustrating day fighting with the limitations of GPKG and the horrible workflow that they mandate, I'd love to start brainstorming on how we can fix this. While previous discussions have related to the GPKG sqlite wal mess, that has (to the extent of my experience) been resolved in the latest release. So I'd like to focus on what I see as the biggest pain point of GPKG: the FID column. This is a pain point for numerous reasons: - The type constraint on the fid column makes it really hard to translate datasets with an existing, non-numeric "fid" column into geopackage. Eg. GML files often have a textual fid string, and attempting to convert these to gpkg results in a string of errors about string values not being usable as fid values, and an empty result layer. The only workaround here is to translate first to an alternative format (such as shp!), delete the fid column, and THEN save as gpkg. - The fid unique constraint, while understandable, results in a HUGE raft of issues while working with these. It's SO easy to get a situation where you have duplicate fids in an edit buffer, and no way to save these features back to the gpkg. You get a series of 1000s of errors about duplicate fid, and then an ambiguous state where you're completely unsure exactly what's been saved and what's about to be lost. This isn't just attributable to a single tool in QGIS -- it's possible to end up with duplicate fids through so many different operations, including really simple stuff like copying and pasting features! I've fought with this since we've really started to push GPKG and, frankly, I've given up. I don't think there's any way to fix the current situation and leave fids as they currently behave. So what I propose is a radical re-think about how GPKG fids are handled and exposed by QGIS (and by GDAL). I propose that we 1. demote fids to being only a "semi-permanent" row identifier, with the message being that "sometimes these WILL change and you can't rely on them as a permanent id field for joins and row identification". If users require a permanent unique identifier (i.e. a primary key) on their table then THEY have to make and manage that themselves, just like shapefiles etc. 2. expose fids as a read-only field. Users can still see them if they want, but they cannot edit them. 3. make QGIS (or GDAL?) ALWAYS generate a completely new fid whenever a row is changed or added. Throwaway the old value, make a new one on EVERY edit/addition. 4 We COMPLETELY ignore any existing fid value set for features added to a GPKG layer. I.e. in the case of translating a GML with a text fid field, we completely ignore the incoming GML fid values and instead use the "always generate a new fid" rule. Yes, these changes will break existing workflows, and possibly break existing tools/scripts. But honestly, in my experience and the experience of my customers, there's a COMPLETE lack of faith and trust in GPKG at this stage. EVERYONE has their horror stories of lost data and mangled datasets. We've got to do something drastic, and we've got to do it sooner rather than later to salvage what little hope does remain for this format. Thoughts? Nyall _______________________________________________ QGIS-Developer mailing list [email protected] List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
