RE: [postgis-users] Versioning

Paragon Corporation Wed, 13 May 2009 23:42:01 -0700

Slight correction -- when I meant pointless I wasn't suggesting that having
a primary key is pointless, just that the idea of a primary key always being
an integer is not ideal in many use cases and you would be better with a
character key such as is the case when dealing with parcel data.

Sorry if I gave people the wrong idea about that. With the inheritance model
you could enforce a unique key say for a parcel in your main table and allow
dups in the history.  The other benefit that should be stated about the
importance of primary keys and candidate keys is they do affect plan
strategy and performance.  If a key is unique the planner knows it does not
have to search any further for more results.  I haven't done any benchmarks
on this recently but I do recall it making at least in older versions a
speed improvement by noting that a key is unique or primary.

Thanks,
Regina

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Paragon
Corporation
Sent: Thursday, May 14, 2009 2:24 AM
To: 'PostGIS Users Discussion'
Subject: RE: [postgis-users] Versioning

Ben,

If you thinking of unioning -- using the built in inheritance model in
PostgreSQL in my opinion is just as good and better.  Let me explain why its
better

1) If you ever looked at the explain plan of a parent table in an
inheritance hierarchy such as would be the case in mystuff_all, you will see
that it is doing an implicit UNION ALL.  This is where the performance
penalty Paul is alluding to comes into play.

2) Unlike a union all which has to use a VIEW -- the parent table looks like
a real table to most idiot GIS/DB tools and can even be edited and in fact
when you define your serial dummy primary key on the parent table -- the
children will inherit the serial (though you will need to specify they are
primary on the child tables).  So in essence the parent table looks more
real than the VIEW you would need to create to emulate it.
To most dum ass GIS tools -- it satisfies their need for a pointless key.

3) If you have an app that will always query from your full versioned table,
its actually more efficient to use inheritance with constraint exclusion
than doing union on the fly. 

With constraint exclusion you could create a dummy field that says this is a
history_status -- and require your main table has this set as 'C' and your
history table set as 'H' or use one of the date fields.  This allows the
planner to skip over the history table if you only want current data and
thus escape the penalty of scanning thru a larger number of records.

If you stuff both your history and your current in the same table -- you
don't have this luxury and the bigger your history -- the slower your
current record searches will become.

Hope that helps,
Regina

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Paul
Ramsey
Sent: Wednesday, May 13, 2009 11:32 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Versioning

On Wed, May 13, 2009 at 7:11 PM, Ben Madin <[email protected]>
wrote:
> Thanks Paul,
>
> This certainly looks like a more efficient system for 'versioning'... 
> are you suggesting the history table also has a copy of the 'current'
record?

Yes, that's what I'm suggesting. You do all your work against yourtable, but
the triggers maintain a fully versioned copy in yourtable_history

> Alternatively, is there likely to be much of a performance hit if I 
> create a view unioning the 'current' table and the 'history' table to 
> be able to achieve this for reporting purposes.

There's no disadvantage to having your _history table just as current as
your working table. (Well, I guess size.) I think Leo's approach avoid that,
but probably has the performance penalty associated with putting things back
together on the fly.

P

> cheers
>
> Ben
>
>
>
>
> On 14/05/2009, at 5:43 AM, Paul Ramsey wrote:
>
>> Then you should be able to get by with a pretty simple system.
>> For each table you want to retain history for, add a second table, 
>> table_history that adds two columns (date_added, date_deleted).
>> The put some insert/update/delete triggers on your main tables.
>> - on insert of row, simultaneously insert a copy into the _history 
>> table with an appropriate date_added
>> - on delete of row update the date_deleted field in the equivalent 
>> _history table row
>> - on update of row, flag the current copy of the row as deleted, and 
>> add a new copy of the row with the current date as the added date
>>
>> Now a snapshot of "historical state" as of date_query is as simple as 
>> querying the history table for "date_added < date_query and 
>> (date_deleted > date_query or date_deleted is null)"
>>
>> The thrifty among us will say "hey, I can just have one table with 
>> date_added and date_deleted columns" and "current state is just all 
>> records with "date_deleted is null". However, your table will grow 
>> over time, and eventually things will get slower than you would like.
>> Having a "working table" that is just current also removes the need 
>> for client applications to understand the "history concept", they can 
>> just work on the table as normal and history is maintained for them 
>> transparently.
>>
>> P.
>>
>>
>> 2009/5/13 Juan Pedro Pérez Alcántara <[email protected]>:
>>>
>>> Sure, you right. I'm refering versioning in it's first meaning, that 
>>> is, history tracking.
>>>
>>> Thanks,
>>>
>>> Juan Pedro Pérez Alcántara
>>>
>>> On Wed, 2009-05-13 at 12:33 -0700, Paul Ramsey wrote:
>>>>
>>>> Juan Pedro,
>>>>
>>>> What do you mean by "versioning"?
>>>>
>>>> History tracking? (Every state of the data is one version, but 
>>>> there is only one branch of the data.) Branching and merging?
>>>> (People can take a copy of the data (a version) and work on it, 
>>>> then merge it back into the main database.)
>>>>
>>>> Versioning is an overloaded word. Describe your actual use case.
>>>>
>>>> P.
>>>>
>>>> 2009/5/13 Juan Pedro Pérez Alcántara <[email protected]>:
>>>>>
>>>>> Hello all,
>>>>>
>>>>> soon we will face a project which will need a really strong
>>>
>>> versioning
>>>>>
>>>>> mechanism, including geometry. I'm sure anybody here have already 
>>>>> addressed this sort of problem. I wonder if you can drop a line or
>>>
>>> two
>>>>>
>>>>> about this issue, for I consider it hard to implement. Sure the 
>>>>> solutions will vary wildly depending on the scenario, the data, 
>>>>> the versioning schedule, etc., but I'll be glad to hear about your 
>>>>> experiences in that field.
>>>>>
>>>>> Greetings,
>>>>>
>>>>> Juan Pedro Pérez Alcántara
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> [email protected]
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> [email protected]
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>> _______________________________________________
>> postgis-users mailing list
>> [email protected]
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> --
>
> Ben Madin
> REMOTE INFORMATION
>
> t : +61 8 9192 5455
> f : +61 8 9192 5535
> m : 0448 887 220
> Broome   WA   6725
>
> [email protected]
>
>
>
>                                                        Out here, it 
> pays to know...
>
>
> _______________________________________________
> postgis-users mailing list
> [email protected]
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

RE: [postgis-users] Versioning

Reply via email to