Slight correction -- when I meant pointless I wasn't suggesting that having a primary key is pointless, just that the idea of a primary key always being an integer is not ideal in many use cases and you would be better with a character key such as is the case when dealing with parcel data.
Sorry if I gave people the wrong idea about that. With the inheritance model you could enforce a unique key say for a parcel in your main table and allow dups in the history. The other benefit that should be stated about the importance of primary keys and candidate keys is they do affect plan strategy and performance. If a key is unique the planner knows it does not have to search any further for more results. I haven't done any benchmarks on this recently but I do recall it making at least in older versions a speed improvement by noting that a key is unique or primary. Thanks, Regina -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Paragon Corporation Sent: Thursday, May 14, 2009 2:24 AM To: 'PostGIS Users Discussion' Subject: RE: [postgis-users] Versioning Ben, If you thinking of unioning -- using the built in inheritance model in PostgreSQL in my opinion is just as good and better. Let me explain why its better 1) If you ever looked at the explain plan of a parent table in an inheritance hierarchy such as would be the case in mystuff_all, you will see that it is doing an implicit UNION ALL. This is where the performance penalty Paul is alluding to comes into play. 2) Unlike a union all which has to use a VIEW -- the parent table looks like a real table to most idiot GIS/DB tools and can even be edited and in fact when you define your serial dummy primary key on the parent table -- the children will inherit the serial (though you will need to specify they are primary on the child tables). So in essence the parent table looks more real than the VIEW you would need to create to emulate it. To most dum ass GIS tools -- it satisfies their need for a pointless key. 3) If you have an app that will always query from your full versioned table, its actually more efficient to use inheritance with constraint exclusion than doing union on the fly. With constraint exclusion you could create a dummy field that says this is a history_status -- and require your main table has this set as 'C' and your history table set as 'H' or use one of the date fields. This allows the planner to skip over the history table if you only want current data and thus escape the penalty of scanning thru a larger number of records. If you stuff both your history and your current in the same table -- you don't have this luxury and the bigger your history -- the slower your current record searches will become. Hope that helps, Regina -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Paul Ramsey Sent: Wednesday, May 13, 2009 11:32 PM To: PostGIS Users Discussion Subject: Re: [postgis-users] Versioning On Wed, May 13, 2009 at 7:11 PM, Ben Madin <[email protected]> wrote: > Thanks Paul, > > This certainly looks like a more efficient system for 'versioning'... > are you suggesting the history table also has a copy of the 'current' record? Yes, that's what I'm suggesting. You do all your work against yourtable, but the triggers maintain a fully versioned copy in yourtable_history > Alternatively, is there likely to be much of a performance hit if I > create a view unioning the 'current' table and the 'history' table to > be able to achieve this for reporting purposes. There's no disadvantage to having your _history table just as current as your working table. (Well, I guess size.) I think Leo's approach avoid that, but probably has the performance penalty associated with putting things back together on the fly. P > cheers > > Ben > > > > > On 14/05/2009, at 5:43 AM, Paul Ramsey wrote: > >> Then you should be able to get by with a pretty simple system. >> For each table you want to retain history for, add a second table, >> table_history that adds two columns (date_added, date_deleted). >> The put some insert/update/delete triggers on your main tables. >> - on insert of row, simultaneously insert a copy into the _history >> table with an appropriate date_added >> - on delete of row update the date_deleted field in the equivalent >> _history table row >> - on update of row, flag the current copy of the row as deleted, and >> add a new copy of the row with the current date as the added date >> >> Now a snapshot of "historical state" as of date_query is as simple as >> querying the history table for "date_added < date_query and >> (date_deleted > date_query or date_deleted is null)" >> >> The thrifty among us will say "hey, I can just have one table with >> date_added and date_deleted columns" and "current state is just all >> records with "date_deleted is null". However, your table will grow >> over time, and eventually things will get slower than you would like. >> Having a "working table" that is just current also removes the need >> for client applications to understand the "history concept", they can >> just work on the table as normal and history is maintained for them >> transparently. >> >> P. >> >> >> 2009/5/13 Juan Pedro Pérez Alcántara <[email protected]>: >>> >>> Sure, you right. I'm refering versioning in it's first meaning, that >>> is, history tracking. >>> >>> Thanks, >>> >>> Juan Pedro Pérez Alcántara >>> >>> On Wed, 2009-05-13 at 12:33 -0700, Paul Ramsey wrote: >>>> >>>> Juan Pedro, >>>> >>>> What do you mean by "versioning"? >>>> >>>> History tracking? (Every state of the data is one version, but >>>> there is only one branch of the data.) Branching and merging? >>>> (People can take a copy of the data (a version) and work on it, >>>> then merge it back into the main database.) >>>> >>>> Versioning is an overloaded word. Describe your actual use case. >>>> >>>> P. >>>> >>>> 2009/5/13 Juan Pedro Pérez Alcántara <[email protected]>: >>>>> >>>>> Hello all, >>>>> >>>>> soon we will face a project which will need a really strong >>> >>> versioning >>>>> >>>>> mechanism, including geometry. I'm sure anybody here have already >>>>> addressed this sort of problem. I wonder if you can drop a line or >>> >>> two >>>>> >>>>> about this issue, for I consider it hard to implement. Sure the >>>>> solutions will vary wildly depending on the scenario, the data, >>>>> the versioning schedule, etc., but I'll be glad to hear about your >>>>> experiences in that field. >>>>> >>>>> Greetings, >>>>> >>>>> Juan Pedro Pérez Alcántara >>>>> >>>>> >>>>> _______________________________________________ >>>>> postgis-users mailing list >>>>> [email protected] >>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users >>>>> >>>> >>> >>> >>> _______________________________________________ >>> postgis-users mailing list >>> [email protected] >>> http://postgis.refractions.net/mailman/listinfo/postgis-users >>> >> _______________________________________________ >> postgis-users mailing list >> [email protected] >> http://postgis.refractions.net/mailman/listinfo/postgis-users > > -- > > Ben Madin > REMOTE INFORMATION > > t : +61 8 9192 5455 > f : +61 8 9192 5535 > m : 0448 887 220 > Broome WA 6725 > > [email protected] > > > > Out here, it > pays to know... > > > _______________________________________________ > postgis-users mailing list > [email protected] > http://postgis.refractions.net/mailman/listinfo/postgis-users > _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
