Re: [HACKERS] temporal support patch

2012-08-28 Thread Jim Nasby

On 8/27/12 12:40 PM, Robert Haas wrote:

On Sat, Aug 25, 2012 at 1:30 PM, David Johnstonpol...@yahoo.com  wrote:

My internals knowledge is basically zero but it would seem that If you
simply wanted the end-of-transaction result you could just record nothing
during the transaction and then copy whatever values are present at commit
to whatever logging mechanism you need.

Whatever values are present and commit could be a terabyte of data.
Or it could be a kilobyte of changed data within a terabyte database.
You'd need some way to identify which data actually needs to be
copied, since you surely don't want to copy the whole database.  And
even if you can identify it, going back and visiting all those blocks
a second time will be expensive.


This makes me think about the original time travel, which was to not 
automatically remove old tuple versions.

I think it would be interesting to look at allowing old tuples to be visible as 
well as doing something different when vacuum comes around.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-27 Thread Robert Haas
On Sat, Aug 25, 2012 at 1:30 PM, David Johnston pol...@yahoo.com wrote:
 My internals knowledge is basically zero but it would seem that If you
 simply wanted the end-of-transaction result you could just record nothing
 during the transaction and then copy whatever values are present at commit
 to whatever logging mechanism you need.

Whatever values are present and commit could be a terabyte of data.
Or it could be a kilobyte of changed data within a terabyte database.
You'd need some way to identify which data actually needs to be
copied, since you surely don't want to copy the whole database.  And
even if you can identify it, going back and visiting all those blocks
a second time will be expensive.

 Since these temporal/audit tables are intended to be maintained by the
 system if you do not ask the users to identify themselves but instead take
 the information directly from the environment, you never have to give a I'm
 sorry Dave response because Dave is never given the chance to submit a
 proposed value.

Well, the point is that I think many people have requirements that are
(1) different from each other and (2) more complicated than the
simplest case we can come up with.  Some people will want to log the
application user (or some other piece of extra data); others won't.
Some people will want to record every change in a transaction; others
won't.  Some people will want to log time stamps; others won't; others
still may want a range per row indicating the time that row version
lived.  Some people will want to delete history before it fills up the
disk; others will want to keep it forever.  Some people will want to
clean up history created by accidental changes; others will want to
make sure that the history is as tamper-proof as possible.  That's
why, of everything that's been said on this topic, I mostly agree with
what Josh Berkus said upthread:

# If you want something in core which will be useful to a lot of our
# users, it needs to be simple and flexible.  Not ornate with lots of
# dependancies. The first version of it should be as simple and minimalist
# as possible.
#
# Personally, I would prefer a tool which just made it simpler to build my
# own triggers, and made it automatic for the history table to track
# changes in the live table.  I think anything we build which controls
# what goes into the history table, etc., will only narrow the user base.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-27 Thread Pavel Stehule

 Well, the point is that I think many people have requirements that are
 (1) different from each other and (2) more complicated than the
 simplest case we can come up with.  Some people will want to log the
 application user (or some other piece of extra data); others won't.
 Some people will want to record every change in a transaction; others
 won't.  Some people will want to log time stamps; others won't; others
 still may want a range per row indicating the time that row version
 lived.  Some people will want to delete history before it fills up the
 disk; others will want to keep it forever.  Some people will want to
 clean up history created by accidental changes; others will want to
 make sure that the history is as tamper-proof as possible.  That's
 why, of everything that's been said on this topic, I mostly agree with
 what Josh Berkus said upthread:

 # If you want something in core which will be useful to a lot of our
 # users, it needs to be simple and flexible.  Not ornate with lots of
 # dependancies. The first version of it should be as simple and minimalist
 # as possible.
 #
 # Personally, I would prefer a tool which just made it simpler to build my
 # own triggers, and made it automatic for the history table to track
 # changes in the live table.  I think anything we build which controls
 # what goes into the history table, etc., will only narrow the user base.


I can't agree - why we need a some simple solution based on tools,
that are available now? I don't think we have to be hurry in support
own proprietary solutions - when isn't difficult do it just with
available tools now.

Regards

Pavel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-27 Thread Robert Haas
On Mon, Aug 27, 2012 at 1:50 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 I can't agree - why we need a some simple solution based on tools,
 that are available now? I don't think we have to be hurry in support
 own proprietary solutions - when isn't difficult do it just with
 available tools now.

Who said anything about proprietary solutions?

I would agree that it is POSSIBLE to do this with the tools that are
available now.  I am not sure that I'd agree that it is easy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-25 Thread Robert Haas
On Mon, Aug 20, 2012 at 7:17 PM, David Johnston pol...@yahoo.com wrote:
 Ideally the decision of whether to do so could be a client decision.  Not
 storing intra-transaction changes is easier than storing all changes.

Not really.  If you don't care about suppressing intra-transaction
changes, you can essentially just have a trigger that fires on every
update and adds information to the side table.  If you do care about
suppressing them, you have to do something more complicated.  Or so it
seems to me.

 I see the user element as having two components:

 Client - what device/channel/user was used to connect to the database -
 PostgreSQL Role
 User - relative to that client which actual user performed the action
 - Application Specified

 A PostgreSQL role would correspond to client whereas the application would
 be allowed to have full control of what User value is stored.

 This gets a little complicated with respect to SET ROLE but gets close to
 the truth.  The idea is that you look at the client to determine the
 namespace over which the user is defined and identified.

 So, a better way to phrase the position is that:

 You cannot allow the application to choose what is stored to identify
 itself (client) - i.e., its credentials identify who it is and those are
 stored without consulting the application

I don't think we can violate the general principle that the database
super-user or table owner can do whatever they want.  If one of those
folks wants to falsify their history, are we really going to tell them
no?  To me that has I'm sorry, Dave, I can't do that written all
over it, and I think we'll get about the same reaction that Hal did.
Now, if user A is inserting into user B's table, and is not the
super-user, then, of course, we can and should ensure that no
falsification is possible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-25 Thread David Johnston
 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Saturday, August 25, 2012 12:46 PM
 To: David Johnston
 Cc: Jeff Davis; Vlad Arkhipov; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] temporal support patch
 
 On Mon, Aug 20, 2012 at 7:17 PM, David Johnston pol...@yahoo.com
 wrote:
  Ideally the decision of whether to do so could be a client decision.
  Not storing intra-transaction changes is easier than storing all
changes.
 
 Not really.  If you don't care about suppressing intra-transaction
changes, you
 can essentially just have a trigger that fires on every update and adds
 information to the side table.  If you do care about suppressing them, you
 have to do something more complicated.  Or so it seems to me.
 

My internals knowledge is basically zero but it would seem that If you
simply wanted the end-of-transaction result you could just record nothing
during the transaction and then copy whatever values are present at commit
to whatever logging mechanism you need.  If you are recording
intra-transaction values you could do so to a temporary storage area and
then, at commit, decide whether the recent value for a given
relation/attribute is going to be retained in the final log or whether you
end up persisting all of the intermediate values as well.


  You cannot allow the application to choose what is stored to identify
  itself (client) - i.e., its credentials identify who it is and those
  are stored without consulting the application
 
 I don't think we can violate the general principle that the database
super-
 user or table owner can do whatever they want.  If one of those folks
wants
 to falsify their history, are we really going to tell them no?  To me
that has
 I'm sorry, Dave, I can't do that written all over it, and I think we'll
get about
 the same reaction that Hal did.
 Now, if user A is inserting into user B's table, and is not the
super-user, then,
 of course, we can and should ensure that no falsification is possible.
 

With respect to the physical log file there is no way for the super-user to
currently falsify (at time of statement execution) the user/role that they
are using.  Even a SET ROLE doesn't change the session user (I forget the
exact mechanics but I pretty sure on the general point).  I do not see how
this is that much different.

I agree that it is pointless to even try to maintain true in-database
auditing in the presence of god-like super-users so most of what I envision
relates to limited permissioned users that are forced to rely upon the
standard mechanisms provided by the database.  As a matter of principle
those wanting a secure and auditable environment should not be using
ownership level roles.

Since these temporal/audit tables are intended to be maintained by the
system if you do not ask the users to identify themselves but instead take
the information directly from the environment, you never have to give a I'm
sorry Dave response because Dave is never given the chance to submit a
proposed value.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-23 Thread Kevin Grittner
Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2012-08-21 at 17:07 -0500, Kevin Grittner wrote:
 
 The fact that it has an unknown sequence number or timestamp for
 purposes of ordering visibility of transactions doesn't mean you
 can't show that it completed in an audit log.  In other words, I
 think the needs for a temporal database are significantly
 different from the needs of an auditing system.
 
 ...
  
 I would assume an audit log would have very different needs from
 tracking changes for a temporal database view.  It even seems
 possible that you might want to see what people *looked* at,
 versus just changes.  You might want to see transactions which
 were rolled back, which are irrelevant for a temporal view.  If
 we're talking about an auditing system, we're talking about an
 almost completely different animal from a temporal view of the
 database.
 
 OK, I think I see what you're saying now. Basically, an audit log
 means different things to different people, so I think it confused
 the issue.
 
Probably.  When I think of an audit log, I tend to think of viewing
who did what when, without that necessarily caring a lot about
viewing interim visible database states.
 
 But temporal is fairly vague, too.
 
Yeah, but in this context I have taken it to mean that someone wants
to run a query such that it sees the database state as of some
previous point in time.  Even with a read-only transaction, if you
want to avoid seeing states of the database which are inconsistent
with business rules enforced through serializable transactions, you
need to deal with some tricky problems.
 
 I am most interested in the topic you brought up about
 serializability and system time (transaction time), because it
 would be a fundamental piece upon which we can build a lot of
 these other things (including what could be called an audit log).
 
[brain dump follows -- remember, you *said* you were interested]
 
If you think it matters for what you are calling an audit log, then
I probably have an incomplete or inaccurate understanding of what
you mean by audit log.  Perhaps you could sketch that out a bit? 
(Or point back to where it was described, if I've missed or
forgotten something that went before.)
 
The reason it's tricky is that while SSI fully complies with the
requirement that the behavior of a set of concurrent serializable
transactions running in a database is consistent with some serial
(one-at-a-time) execution of those transactions, it does not share
certain properties with other types of serializable implementations,
so people may be assuming those additional properties where they
don't actually exist.
 
The two most common alternatives to SSI are S2PL and OCC.  Under
both of these techniques, the apparent order of execution (the order
in which the transactions could have run to produce the same results
as if they were run one-at-a-time) is the commit order.  In S2PL
this is accomplished by having reads block writes until commit time
and writes block everything until commit time.  In OCC this is
accomplished by checking the read set of a transaction at commit
time and rolling back the transaction if there is a single write by
another transaction which conflicts with the predicate locks of the
read set (i.e., there is a single read-write conflict out from the
transaction being committed).
 
SSI dodges the blocking and the high rollback rate, but the
technique has these characteristics which may be surprising:
 
 - The apparent order of execution is not always the commit order. 
If two transactions are concurrent, and T1 reads something which
would look different if it could see the work of T2 (but it *can't*
because the transactions are concurrent), then T1 *appears* to have
executed before T2.  T2 might actually *start* first and *commit*
first, but if there was overlap and the rw-conflict, then T1 ran
first *logically*.  SSI prevents cycles in this ordering, by
canceling a transaction when a possible cycle is detected.
 
 - A read-only transactions can cause an anomaly where there would
otherwise not be one.  This is because a transaction which
appeared to commit after another transaction based on rw-conflicts
may have actually committed first, and would be visible to the
read-only transaction while the work of the earlier transaction
would not show up for it; if no transaction observes that state,
then the problem goes away when the logically earlier transaction
later commits.  If the state is observed, even by a read-only
transaction, then the earlier transaction logically can't have
happened -- so it must be rolled back with a serialization failure.
Within one database, this is tracked and handled by SSI.  My
concern is that the transactions might both commit, then a time
traveler goes back and sees the state that never happened.

One of the features added with SSI was DEFERRABLE transactions.  The
point of this is that when a snapshot is generated, it can often be
determined (either immediately or 

Re: [HACKERS] temporal support patch

2012-08-22 Thread Jeff Davis
On Tue, 2012-08-21 at 17:07 -0500, Kevin Grittner wrote:
 The fact that it has an unknown sequence number or timestamp for
 purposes of ordering visibility of transactions doesn't mean you
 can't show that it completed in an audit log.  In other words, I
 think the needs for a temporal database are significantly different
 from the needs of an auditing system.

...
 
 I would assume an audit log would have very different needs from
 tracking changes for a temporal database view.  It even seems
 possible that you might want to see what people *looked* at, versus
 just changes.  You might want to see transactions which were rolled
 back, which are irrelevant for a temporal view.  If we're talking
 about an auditing system, we're talking about an almost completely
 different animal from a temporal view of the database.

OK, I think I see what you're saying now. Basically, an audit log means
different things to different people, so I think it confused the issue.
But temporal is fairly vague, too. It also seems like there might be a
lot of overlap, depending on how we define those terms.

I am most interested in the topic you brought up about serializability
and system time (transaction time), because it would be a fundamental
piece upon which we can build a lot of these other things (including
what could be called an audit log).

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-21 Thread Vlad Arkhipov

On 08/21/2012 01:52 PM, Jeff Davis wrote:

On Mon, 2012-08-20 at 16:32 -0700, Josh Berkus wrote:

Personally, I would prefer a tool which just made it simpler to build my
own triggers, and made it automatic for the history table to track
changes in the live table.  I think anything we build which controls
what goes into the history table, etc., will only narrow the user base.

That sounds like a good way to start. Actually, even before the tool,
how about just some really good examples of triggers for specific kinds
of audit logs, and some ways to run queries on them? I think that might
settle a lot of these details.


Here is the example of triggers we use in our applications. This is the 
test implementation, the production one uses similar triggers written in C.

http://softus.org/?page_id=63

1. There are 3 tables: test contains only current data, test_history 
contains only historical data and test_audit contains all data.
2. There must be a field in an audited table system_time for a period of 
validity of the row.
3. Optional fields are: txid_modified, user_modified for txid/user that 
inserts or updated the row, txid_deleted, user_deleted for txid/user 
that deleted the row. There may be other information in the audit table 
that was omitted in the example (client IP, host name, etc.)
3. We do not use txid_current() as transaction ID because backup/restore 
resets it.

4. User is set by the application (audit.current_user() is just a dummy).
5. There is no exclusion constraint on (primary key, system_time) in 
history table, integrity is maintained by triggers (however the user can 
damage the historical data by modifying test_history table).
6. It's important to understand that when audit triggers are enabled 
some modifications can fail because the same row may be concurrently 
modified by another transaction CURRENT_TIMESTAMP of which is lower or 
the same as the current one.


Re: [HACKERS] temporal support patch

2012-08-21 Thread Anssi Kääriäinen
I have written one approach to audit tables, available from 
https://github.com/akaariai/pgsql_shadow_tables


The approach is that every table is backed by a similar audit table + 
some meta information. The tables and triggers to update the audit 
tables are managed by plpgsql procedures.


While the approach isn't likely that interesting itself there is one 
interesting aspects. Views similar to the original tables are created 
automatically in the shadow schema. The views use a session variable for 
wanted snapshot time. The reason is that one can use this to query the 
database at wanted time:


set search_path = 'shadow_public, public';
set test_session_variable.view_time = 'wanted view timestamp'; -- for 
example '2012-05-06 22:08:00'


And now you can use exactly the same queries you use normally to 
retrieve data from wanted view timestamp. This is very useful if you 
happen to use an ORM.


In addition the known limitations mentioned in the README are likely 
something the temporal support patch needs to tackle.


 - Anssi


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-21 Thread Kevin Grittner
Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2012-08-20 at 19:32 -0500, Kevin Grittner wrote:
 Josh Berkus j...@agliodbs.com wrote:
  
 This is sounding like a completely runaway spec on what should
 be a simple feature.
  
 I hate to contribute to scope creep (or in this case scope
 screaming down the tracks at full steam), but I've been watching
 this with a queasy feeling about interaction with Serializable
 Snapshot Isolation (SSI).
 
 There are all kinds of challenges here, and I'm glad you're
 thinking about them. I alluded to some problems here:
 

http://archives.postgresql.org/message-id/1345415312.20987.56.camel@jdavis
 
 But those might be a subset of the problems you're talking about.
 
 It sounds like, at a high level, there are two problems:
 
 1. capturing the apparent order of execution in the audit log
 2. assigning meaningful times to the changes that are consistent
 with the apparent order of execution
 
As far as I can see, transactions which execute DML at any
transaction isolation level other than serializable can be
considered to have occurred in commit order.  Transactions which
don't write to the database don't need to be considered as part of
the history, at least in terms of viewing prior state.  Same with
transactions which roll back.  (Now, failed transactions and reads
might be of interest for some audit reports, but that seems to me
like a different issue than a temporal database.)
 
The funny bit is for a serializable transaction (TN) which commits
after writing to the database -- you can't know the apparent order
of execution as long as there are any serializable transactions
active which can't see the work of TN (i.e., the transactions
overlap).  If such a transaction (TX) executes a read which
conflicts with a TN write, TX appears to have executed first, since
it doesn't see the work of TN, so I think the sequence number or
timestamp for TN has to follow that for TX even though TN committed
first.  On the other hand, TX might write something that conflicts
with a TN read, in which case TN will appear to have executed first
and must get a sequence number or timestamp before TX.
If there is a cycle, SSI will cancel one of the transactions
involved, so that can't occur anywhere in the time line.
 
So, if you want to allow serializable temporal queries, the timing
of a read-write serializable transaction can't be locked down until
all overlapping read-write serializable transactions complete; and
the apparent order of execution must be based on read-write
conflicts, which are tracked within SSI.  I think that if we can
generate a list of committed transactions in order based on this
logic, it could feed into replication system -- hot standby as well
as trigger-based systems.  I think we could generate snapshots which
exclude the transactions for which the order of execution has not
yet been determined, and avoid the delays involved in other possible
solutions.
 
There's a lot of detail missing here in terms of what the API would
be, and how we handle the summarization that can occur within SSI so
that it can continue to function within bounded memory even in
pessimal circumstances, but that's the general outline of my
concerns and suggested solution.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-21 Thread David Fetter
On Mon, Aug 20, 2012 at 09:33:45PM -0700, Jeff Davis wrote:
 On Mon, 2012-08-20 at 19:17 -0400, David Johnston wrote:
  Ideally the decision of whether to do so could be a client
  decision.  Not storing intra-transaction changes is easier than
  storing all changes.  At worse you could stage up all changed then
  simply fail to store all intermediate results within a given
  relation.  It that case you gain nothing in execution performance
  but safe both storage and interpretative resources.  So the
  question becomes is it worth doing without the ability to store
  intermediate results?  If you were to ponder both which setup
  would the default be?  If the default is the harder one (all
  statements) to implement then to avoid upgrade issues the syntax
  should specify that it is logging transactions only.
 
 I think the biggest question here is what guarantees can be offered?
 What if the transaction aborts after having written some data, does
 the audit log still get updated?

There are definitely use cases for this, but until we have autonomous
transactions, a totally separate project, I don't think we should
attempt them in the first version.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-21 Thread Gavin Flower

On 22/08/12 02:16, Kevin Grittner wrote:

Jeff Davis pg...@j-davis.com wrote:

On Mon, 2012-08-20 at 19:32 -0500, Kevin Grittner wrote:

Josh Berkus j...@agliodbs.com wrote:
  

This is sounding like a completely runaway spec on what should
be a simple feature.
  
I hate to contribute to scope creep (or in this case scope

screaming down the tracks at full steam), but I've been watching
this with a queasy feeling about interaction with Serializable
Snapshot Isolation (SSI).

There are all kinds of challenges here, and I'm glad you're
thinking about them. I alluded to some problems here:



http://archives.postgresql.org/message-id/1345415312.20987.56.camel@jdavis

But those might be a subset of the problems you're talking about.

It sounds like, at a high level, there are two problems:

1. capturing the apparent order of execution in the audit log
2. assigning meaningful times to the changes that are consistent
with the apparent order of execution
  
As far as I can see, transactions which execute DML at any

transaction isolation level other than serializable can be
considered to have occurred in commit order.  Transactions which
don't write to the database don't need to be considered as part of
the history, at least in terms of viewing prior state.  Same with
transactions which roll back.  (Now, failed transactions and reads
might be of interest for some audit reports, but that seems to me
like a different issue than a temporal database.)
  
The funny bit is for a serializable transaction (TN) which commits

after writing to the database -- you can't know the apparent order
of execution as long as there are any serializable transactions
active which can't see the work of TN (i.e., the transactions
overlap).  If such a transaction (TX) executes a read which
conflicts with a TN write, TX appears to have executed first, since
it doesn't see the work of TN, so I think the sequence number or
timestamp for TN has to follow that for TX even though TN committed
first.  On the other hand, TX might write something that conflicts
with a TN read, in which case TN will appear to have executed first
and must get a sequence number or timestamp before TX.
If there is a cycle, SSI will cancel one of the transactions
involved, so that can't occur anywhere in the time line.
  
So, if you want to allow serializable temporal queries, the timing

of a read-write serializable transaction can't be locked down until
all overlapping read-write serializable transactions complete; and
the apparent order of execution must be based on read-write
conflicts, which are tracked within SSI.  I think that if we can
generate a list of committed transactions in order based on this
logic, it could feed into replication system -- hot standby as well
as trigger-based systems.  I think we could generate snapshots which
exclude the transactions for which the order of execution has not
yet been determined, and avoid the delays involved in other possible
solutions.
  
There's a lot of detail missing here in terms of what the API would

be, and how we handle the summarization that can occur within SSI so
that it can continue to function within bounded memory even in
pessimal circumstances, but that's the general outline of my
concerns and suggested solution.
  
-Kevin




So if I understand correctly...

If there is a very long running transaction, say 1 hour, then all (or 
just some? - depending) transactions that nominally start and finish 
within that time, can not have definitive start times until the very 
long running transaction finishes, even if they are successfully committed?


So if someone looks at the audit log they might not see all the 
transactions they expect to see.


So, if I had an automatic query A which updated statistics based on on 
transactions committed over the last 10 minutes, then many (all?) 
transactions starting and successfully completing during the time of the 
very long running transaction will never show up! Here I am envisioning 
a query fired off every ten minutes looking for audit records with 
timestamps within the previous ten minutes. However, if I ran a query B 
looking at audit record numbers with in 10 minute intervals for a week, 
but kicked off 24 hours after the week finished -- then I would see the 
records I did not see in query A.


Hmm... if I am at all right, then probably best to have some suitably 
worded 'government health warning' prominent in the documentation!



Cheers,
Gavin







Re: [HACKERS] temporal support patch

2012-08-21 Thread Alvaro Herrera
Excerpts from Gavin Flower's message of mar ago 21 16:51:57 -0400 2012:
 On 22/08/12 02:16, Kevin Grittner wrote:

  So, if you want to allow serializable temporal queries, the timing
  of a read-write serializable transaction can't be locked down until
  all overlapping read-write serializable transactions complete; and
  the apparent order of execution must be based on read-write
  conflicts, which are tracked within SSI.  I think that if we can
  generate a list of committed transactions in order based on this
  logic, it could feed into replication system -- hot standby as well
  as trigger-based systems.  I think we could generate snapshots which
  exclude the transactions for which the order of execution has not
  yet been determined, and avoid the delays involved in other possible
  solutions.

 If there is a very long running transaction, say 1 hour, then all (or 
 just some? - depending) transactions that nominally start and finish 
 within that time, can not have definitive start times until the very 
 long running transaction finishes, even if they are successfully committed?
 
 So if someone looks at the audit log they might not see all the 
 transactions they expect to see.

I think there would need to be a way to also list transactions which are
in progress -- this would include not only live transactions, but also
all those transactions that have actually committed but are not yet
listed as committed because their position in the commit sequence has
not been determined.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-21 Thread Kevin Grittner
Gavin Flower gavinflo...@archidevsys.co.nz wrote:
 
 So if I understand correctly...
 
 If there is a very long running transaction, say 1 hour, then all
 (or just some? - depending) transactions that nominally start and
 finish within that time, can not have definitive start times until
 the very long running transaction finishes, even if they are
 successfully committed?
 
That's not correct.  Any transaction which started can certainly
have a start time. Any transaction which completed can certainly
have a commit or rollback time.  What they *can't* have is a known
position in the apparent order of execution for serializable
transactions, which might be different from the order of start and
commit times.
 
The fact that it has an unknown sequence number or timestamp for
purposes of ordering visibility of transactions doesn't mean you
can't show that it completed in an audit log.  In other words, I
think the needs for a temporal database are significantly different
from the needs of an auditing system.
 
And keep in mind, we are only talking about seeing read-write
serializable transactions which might yet conflict with other
read-write serializable transactions when choosing to look at the
prior state within a temporal serializable transaction.  That's easy
enough to avoid if you want to do so.
 
 So if someone looks at the audit log they might not see all the 
 transactions they expect to see.
 
I would assume an audit log would have very different needs from
tracking changes for a temporal database view.  It even seems
possible that you might want to see what people *looked* at, versus
just changes.  You might want to see transactions which were rolled
back, which are irrelevant for a temporal view.  If we're talking
about an auditing system, we're talking about an almost completely
different animal from a temporal view of the database.
 
 So, if I had an automatic query A which updated statistics based
 on on transactions committed over the last 10 minutes, then many
 (all?) transactions starting and successfully completing during
 the time of the very long running transaction will never show up!
 
A statistics capture process like that doesn't seem like a place
where you care about the apparent order of execution of serializable
transactions.
 
 Here I am envisioning a query fired off every ten minutes looking
 for audit records with timestamps within the previous ten minutes.
 
Which timestamp would make sense for that?
 
 However, if I ran a query B looking at audit record numbers with
 in 10 minute intervals for a week, but kicked off 24 hours after
 the week finished -- then I would see the records I did not see in
 query A.
 
 Hmm... if I am at all right, then probably best to have some
 suitably worded 'government health warning' prominent in the
 documentation!
 
We're clearly talking at cross purposes.  I'm discussing what is
needed to be able to see a past state of the database in a
transaction which would only see states of the database which are
consistent with some serial execution of serializable transactions
which modified the database, and you're talking about an audit
table.  If we're actually talking about an audit system, I have a
whole different set of concerns, and I would not be bringing this
one up.
 
The whole point of my concern is that if you have a business rule
enforced by database triggers that rows in a certain table contain
some batch identifier and no rows can be added to a batch after some
database change flags that batch as closed, then *without* what I
suggest, you could view a closed batch and see one set of rows, and
view the batch at a later point in time and magically see rows
appear in violation of the enforced business rules.  I'm talking
about *preventing* surprising appearances of data after the fact. 
You would need a big warning if you *don't* have what I suggest.
 
Now, it is quite possible that one table (or set of tables) could do
double-duty for both temporal queries and auditing, but the fact
that something is not needed there for one purpose doesn't mean it
isn't needed for the other.
 
There are still some fuzzy areas around how things would look with a
*mix* of serializable and other transactions updating the database;
but I think in the long run we will find that people either want to
do all of their modifications through SSI, or none of them.  The
guarantees get pretty weak if you don't know that all transactions
were part of the review of dangerous structures which can cause
anomalies.
 
If anyone is still unclear about serializable transactions in
PostgreSQL and wants to invest the time needed to read 12 pages
(including footnotes and pretty colored performance graphs) to
learn about it -- the paper which is going to be presented at the
VLDB conference next week goes at it from a different angle than I
usually approach it, and it may click with many people where my
discussions have fallen short.
 

Re: [HACKERS] temporal support patch

2012-08-21 Thread Kevin Grittner
Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 
 I think there would need to be a way to also list transactions
 which are in progress -- this would include not only live
 transactions, but also all those transactions that have actually
 committed but are not yet listed as committed because their
 position in the commit sequence has not been determined.
 
That might allow the same data to serve both needs with one or two
fewer timestamp (or similar) columns that what my post of a few
minutes ago suggested.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-20 Thread Robert Haas
On Sun, Aug 19, 2012 at 6:28 PM, Jeff Davis pg...@j-davis.com wrote:
 The other issue is how to handle multiple changes of the same record
 within the transaction. Should they be stored or not?

 In a typical audit log, I don't see any reason to. The internals of a
 transaction should be implementation details; invisible to the outside,
 right?

I'm not convinced.

 I'm not sure that the database user is the proper thing to be stored in
 the history table. Many applications usually connect to a database using
 some virtual user and have their own users/roles tables to handle with
 privileges. There should be some way to substitute the stored user in
 the history table with the application's one. It's also helpful to store
 transaction id that inserted/updated/deleted the record.

 If the system is recording it for audit purposes, then it better be sure
 that it's true. You can't allow the application to pick and choose what
 gets stored there.

That position would render this feature useless for every application
for which I would otherwise have used it.  I think it's just nonsense
to talk about what we can or can't let the user do.  The user is in
charge, and our job is to allow him to do what he wants to do more
easily, not to dictate what he must do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-20 Thread David Johnston
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
 ow...@postgresql.org] On Behalf Of Robert Haas
 Sent: Monday, August 20, 2012 5:04 PM
 To: Jeff Davis
 Cc: Vlad Arkhipov; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] temporal support patch
 
 On Sun, Aug 19, 2012 at 6:28 PM, Jeff Davis pg...@j-davis.com wrote:
  The other issue is how to handle multiple changes of the same record
  within the transaction. Should they be stored or not?
 
  In a typical audit log, I don't see any reason to. The internals of a
  transaction should be implementation details; invisible to the
  outside, right?
 
 I'm not convinced.

Ideally the decision of whether to do so could be a client decision.  Not
storing intra-transaction changes is easier than storing all changes.  At
worse you could stage up all changed then simply fail to store all
intermediate results within a given relation.  It that case you gain nothing
in execution performance but safe both storage and interpretative resources.
So the question becomes is it worth doing without the ability to store
intermediate results?  If you were to ponder both which setup would the
default be?  If the default is the harder one (all statements) to implement
then to avoid upgrade issues the syntax should specify that it is logging
transactions only.

Random, somewhat related, thought:  I do all my working on a temporary
staging table and then, as my final action, insert the resultant records
onto a separate live table and drop the temporary table.  Further changes to
said record I perform by deleting the original then inserting a new record
(from staging again) with all the values changed.  Obviously this has
limitations with respect to foreign keys and such but it is possible.  What
happens to the audit log if the PK changes and if it does not change?  Any
other implications that need to be address or is it like giving a loaded gun
to someone and trust them to use is responsibily?

 
  I'm not sure that the database user is the proper thing to be stored
  in the history table. Many applications usually connect to a database
  using some virtual user and have their own users/roles tables to
  handle with privileges. There should be some way to substitute the
  stored user in the history table with the application's one. It's
  also helpful to store transaction id that inserted/updated/deleted the
 record.
 
  If the system is recording it for audit purposes, then it better be
  sure that it's true. You can't allow the application to pick and
  choose what gets stored there.
 
 That position would render this feature useless for every application for
 which I would otherwise have used it.  I think it's just nonsense to talk
about
 what we can or can't let the user do.  The user is in charge, and our job
is to
 allow him to do what he wants to do more easily, not to dictate what he
must
 do.
 
 --

I see the user element as having two components:

Client - what device/channel/user was used to connect to the database -
PostgreSQL Role
User - relative to that client which actual user performed the action
- Application Specified

A PostgreSQL role would correspond to client whereas the application would
be allowed to have full control of what User value is stored.

This gets a little complicated with respect to SET ROLE but gets close to
the truth.  The idea is that you look at the client to determine the
namespace over which the user is defined and identified.

So, a better way to phrase the position is that:

You cannot allow the application to choose what is stored to identify
itself (client) - i.e., its credentials identify who it is and those are
stored without consulting the application

At that point you've basically shifted responsibility for the correctness of
the audit log onto that application and away from the database.  However,
you do provide a place for the application to store an identifier that it is
able to resolve to a user if necessary.

This is an arbitrary two-layer hierarchy and while conceptually anything
with two layers may want more I am not sure whether the extra complexity
that would entail would be worth the effort.  Depending on what kinds of
information you allowed to be stored for User it becomes something that
can be modeled when desired and ignored otherwise.

The issue with adding the PostgreSQL role to the database in this way is
that you now can never delete that role or reassign it to another entity.  I
guess with temporal you could do so and basically have the identity-role
relationship define over specific periods of time...  I can (have) imagine a
whole level of indirection and association to be able to reasonably handle
assigning and storing permanent identities while allowing logon credentials
to remain outside of permanent storage.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref

Re: [HACKERS] temporal support patch

2012-08-20 Thread Josh Berkus
On 8/20/12 4:17 PM, David Johnston wrote:
 The issue with adding the PostgreSQL role to the database in this way is
 that you now can never delete that role or reassign it to another entity.  I
 guess with temporal you could do so and basically have the identity-role
 relationship define over specific periods of time...  I can (have) imagine a
 whole level of indirection and association to be able to reasonably handle
 assigning and storing permanent identities while allowing logon credentials
 to remain outside of permanent storage.

This is sounding like a completely runaway spec on what should be a
simple feature.

If you want something in core which will be useful to a lot of our
users, it needs to be simple and flexible.  Not ornate with lots of
dependancies. The first version of it should be as simple and minimalist
as possible.

Personally, I would prefer a tool which just made it simpler to build my
own triggers, and made it automatic for the history table to track
changes in the live table.  I think anything we build which controls
what goes into the history table, etc., will only narrow the user base.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-20 Thread Kevin Grittner
Josh Berkus j...@agliodbs.com wrote:
 
 This is sounding like a completely runaway spec on what should be
 a simple feature.
 
I hate to contribute to scope creep (or in this case scope screaming
down the tracks at full steam), but I've been watching this with a
queasy feeling about interaction with Serializable Snapshot
Isolation (SSI).  Under SSI the apparent order of execution is not
always the transaction commit order, or the transaction start order.
So a temporal database would be vulnerable to seeing anomalies like
this one unless rw-conflicts (as tracked with predicate locks) are
considered:
 
http://wiki.postgresql.org/wiki/SSI#Deposit_Report
 
This raises something I talked vaguely about in Ottawa this year,
although it was pretty much at the hand-waving stage and I don't
know how well I got the idea across.  I've been thinking about the
problems with all the various replication technologies being able to
present data consistent with serializable transactions, and have the
outlines of a technique I think might be more palatable to the
community that those previously discussed.  Basically, it would
involve generating a list of committed XIDs in *apparent order of
execution*, and creating snapshots on the replicas based on that
instead of just the master's transaction commit order.  I've been
trying to work through the details to the point where I can present
a coherent write-up on it.
 
I wouldn't want to hold up a feature like temporal queries on the
basis that it didn't immediately play nice with SSI, but it seems
like it would be a good thing if the view of the past wasn't too
strictly tied to transaction commit sequence; a little bit of
abstraction there might save a lot of pain in tying these features
together.  Maybe something along the lines of a transaction
visibility sequence number, or *maybe* a timestamptz works as long
as that can be fudged to a time after the commit time for
transactions involved in rw-conflicts with concurrent transactions. 
(I'm not sure microsecond resolution works for other, reasons, but
if it does...)  I think either could work.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-20 Thread Jeff Davis
On Mon, 2012-08-20 at 17:04 -0400, Robert Haas wrote:
 On Sun, Aug 19, 2012 at 6:28 PM, Jeff Davis pg...@j-davis.com wrote:
  The other issue is how to handle multiple changes of the same record
  within the transaction. Should they be stored or not?
 
  In a typical audit log, I don't see any reason to. The internals of a
  transaction should be implementation details; invisible to the outside,
  right?
 
 I'm not convinced.

As I understand it, we are talking about recording data changes in one
table to another table. Auditing of reads or the logging of raw
statements seem like very different kinds of projects to me; but tell me
if you think differently.

So if we are recording data changes, I don't see much point in recording
uncommitted changes. Perhaps my imagination is failing, and someone else
can fill me in on a use case.

I'm also struggling with the semantics: if we record uncommitted
changes, do we record them even if the transaction aborts? If so, what
guarantees do we offer about the change actually being recorded?

  I'm not sure that the database user is the proper thing to be stored in
  the history table. Many applications usually connect to a database using
  some virtual user and have their own users/roles tables to handle with
  privileges. There should be some way to substitute the stored user in
  the history table with the application's one. It's also helpful to store
  transaction id that inserted/updated/deleted the record.
 
  If the system is recording it for audit purposes, then it better be sure
  that it's true. You can't allow the application to pick and choose what
  gets stored there.
 
 That position would render this feature useless for every application
 for which I would otherwise have used it.

We could offer a GUC like audit_context or audit_app_context that
takes a text string, and the audit log would record the value stored in
that GUC along with the data changes in question.

The main thing I object to is an implication that the system is vouching
for some particular fact that is supplied by a userset GUC. Remember,
there are guaranteed to be application-level problems that allow these
GUCs to get set improperly for all kinds of reasons. We don't want bug
reports along the lines of security breach! PG allows application_name
to be spoofed in the audit log!.

Also, I'd prefer not use existing GUCs, because there may be all kinds
of other reasons that people set existing GUCs, and we want them to be
able to handle the audit_context one more carefully and have a clear
warning in the documentation.

  I think it's just nonsense
 to talk about what we can or can't let the user do.  The user is in
 charge, and our job is to allow him to do what he wants to do more
 easily, not to dictate what he must do.

Remember that the users who depend on the veracity of the audit log are
users, too. Let's try to serve both classes of user if we can.

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-20 Thread Jeff Davis
On Mon, 2012-08-20 at 19:17 -0400, David Johnston wrote:
 Ideally the decision of whether to do so could be a client decision.  Not
 storing intra-transaction changes is easier than storing all changes.  At
 worse you could stage up all changed then simply fail to store all
 intermediate results within a given relation.  It that case you gain nothing
 in execution performance but safe both storage and interpretative resources.
 So the question becomes is it worth doing without the ability to store
 intermediate results?  If you were to ponder both which setup would the
 default be?  If the default is the harder one (all statements) to implement
 then to avoid upgrade issues the syntax should specify that it is logging
 transactions only.

I think the biggest question here is what guarantees can be offered?
What if the transaction aborts after having written some data, does the
audit log still get updated?

 I see the user element as having two components:

I think this is essentially a good idea, although as I said in my other
email, we should be careful how we label the application-supplied
information in the audit log.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-20 Thread Jeff Davis
On Mon, 2012-08-20 at 16:32 -0700, Josh Berkus wrote:
 This is sounding like a completely runaway spec on what should be a
 simple feature.

My feeling as well. However, we will eventually want to coalesce around
some best practices and make it easy and robust for typical cases.

 Personally, I would prefer a tool which just made it simpler to build my
 own triggers, and made it automatic for the history table to track
 changes in the live table.  I think anything we build which controls
 what goes into the history table, etc., will only narrow the user base.

That sounds like a good way to start. Actually, even before the tool,
how about just some really good examples of triggers for specific kinds
of audit logs, and some ways to run queries on them? I think that might
settle a lot of these details.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-20 Thread Jeff Davis
On Mon, 2012-08-20 at 19:32 -0500, Kevin Grittner wrote:
 Josh Berkus j...@agliodbs.com wrote:
  
  This is sounding like a completely runaway spec on what should be
  a simple feature.
  
 I hate to contribute to scope creep (or in this case scope screaming
 down the tracks at full steam), but I've been watching this with a
 queasy feeling about interaction with Serializable Snapshot
 Isolation (SSI).

There are all kinds of challenges here, and I'm glad you're thinking
about them. I alluded to some problems here:

http://archives.postgresql.org/message-id/1345415312.20987.56.camel@jdavis

But those might be a subset of the problems you're talking about.

It sounds like, at a high level, there are two problems:

1. capturing the apparent order of execution in the audit log
2. assigning meaningful times to the changes that are consistent with
the apparent order of execution

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-20 Thread Craig Ringer

On 08/21/2012 12:52 PM, Jeff Davis wrote:

On Mon, 2012-08-20 at 16:32 -0700, Josh Berkus wrote:

This is sounding like a completely runaway spec on what should be a
simple feature.


My feeling as well. However, we will eventually want to coalesce around
some best practices and make it easy and robust for typical cases.


Personally, I would prefer a tool which just made it simpler to build my
own triggers, and made it automatic for the history table to track
changes in the live table.  I think anything we build which controls
what goes into the history table, etc., will only narrow the user base.


FWIW, I've found the ability to exclude columns from my history triggers 
to be important because of:


- optimistic locking fields used by some clients; and
- Trigger-maintained summary fields

Without being able to apply some exclusions there's just too much churn 
in the history of some tables.


Here's what I'm using at the moment:

http://wiki.postgresql.org/wiki/Audit_trigger_91plus

(I know storing both the relation oid and the text-form table and schema 
name is redundant. The text is handy if the table is dropped and 
recreated, though, and the oid is quicker  easier much of the time).


I use both the per-query and per-row forms depending on the granularity 
I need.


--
Craig Ringer


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-20 Thread Craig Ringer

On 08/21/2012 12:52 PM, Jeff Davis wrote:

On Mon, 2012-08-20 at 16:32 -0700, Josh Berkus wrote:

This is sounding like a completely runaway spec on what should be a
simple feature.


My feeling as well. However, we will eventually want to coalesce around
some best practices and make it easy and robust for typical cases.


Personally, I would prefer a tool which just made it simpler to build my
own triggers, and made it automatic for the history table to track
changes in the live table.  I think anything we build which controls
what goes into the history table, etc., will only narrow the user base.


That sounds like a good way to start. Actually, even before the tool,
how about just some really good examples of triggers for specific kinds
of audit logs


That reminds me: The single biggest improvement I can see for audit 
triggers would be to provide an _easy_ and _efficient_ way to test 
whether any fields have changed between OLD and NEW *except* for one or 
more ignored fields.


Right now if I have a four-column table and I want to ignore UPDATEs to 
col2 for audit purposes, I have to write:



CREATE TRIGGER tablename_audit_insert_delete
AFTER INSERT OR DELETE ON sometable FOR EACH ROW
EXECUTE PROCEDURE audit_func();

CREATE TRIGGER tablename_audit_update_selective
AFTER UPDATE ON sometable FOR EACH ROW
WHEN (
   OLD.col1 IS DISTINCT FROM NEW.col1 OR
   OLD.col3 IS DISTINCT FROM NEW.col3 OR
   OLD.col4 IS DISTINCT FROM NEW.col4 OR
)
EXECUTE PROCEDURE audit.if_modified_func();

... which is horrible for all sorts of reasons:

- If a column is added the audit trigger also needs an update to test 
for it, otherwise it'll be ignored;


- It isn't explicit that col2 is ignored; and

- I have to repeat my trigger definitions twice.


An alternative is to create hstores from OLD and NEW, delete the field 
of interest, and compare them. That's pretty slow though, and may 
duplicate work done by the already-expensive audit trigger.


What I'm imagining is something like a:

row_equals_ignorecols(OLD, NEW, 'col2')

... which would solve half the problem, and is simple enough I could 
implement it with a little C function.


A way to avoid splitting the trigger function definition and a built-in 
compare rows except columns would be great, though.


--
Craig Ringer



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-19 Thread Jeff Davis
On Mon, 2012-06-25 at 17:46 +0900, Vlad Arkhipov wrote:
 It's not sufficient to store only a period of validity for a row. If two 
 transactions started in the same time change the same record, you have a 
 problem with TSTZRANGE type because it's normalized to empty interval. 

That's an interesting point.

Let's say you tried setting it to [T1, T2) where T1 is the time of the
last transaction to update it and T2 is the time of the current
transaction. If T2 = T1, then TSTZRANGE will throw an error, not store
the empty interval.

And we don't want to store the empty interval, because it would be a
lie. There could have been some transaction T3 that happened during T2
that saw the value from T1, so saying that there were no times where
that was visible to the system is false. Throwing an error allows you to
retry T2, which should allow a microsecond or so to pass, and the
problem should resolve itself (assuming your clock didn't move
backwards, which is a different issue).

We could also argue about the start versus end times of transactions,
and snapshot acquisition times, because that could cause confusion if
there are long-running transactions. It might be a good reason to store
the modifying transaction ID as well, but then you get into transaction
wraparound problems.

 The other issue is how to handle multiple changes of the same record 
 within the transaction. Should they be stored or not?

In a typical audit log, I don't see any reason to. The internals of a
transaction should be implementation details; invisible to the outside,
right?

 Also it's necessary to store some kind of operation type that was 
 applied to the record (insert/update/delete). For example, there is a 
 table with one record with validity period [0, ) and value 'A'.
 
 First way
 1. Delete this record in time 1, now there is [0, 1), A in the history 
 table.
 2. Insert a new record in time 1, now there is [0, 1), A in the history 
 table and [1, ), B record in the current data table.
 
 Second way
 1. Update this record in time 1, now there is [0, 1), A in the history 
 table and [1, ), B record in the current data table.
 
 So you have the same data in the tables but the actions that led to this 
 configuration were different and the history has been lost partly.

Right. Those are yet more possible options that people might want for an
audit log.

  * There is other useful information that could be recorded, such as the
  user who inserted/updated/deleted the record.
 I'm not sure that the database user is the proper thing to be stored in 
 the history table. Many applications usually connect to a database using 
 some virtual user and have their own users/roles tables to handle with 
 privileges. There should be some way to substitute the stored user in 
 the history table with the application's one. It's also helpful to store 
 transaction id that inserted/updated/deleted the record.

If the system is recording it for audit purposes, then it better be sure
that it's true. You can't allow the application to pick and choose what
gets stored there.

While it may be true that many applications just all use the same DB
user, if you want an audit log that includes user information you have
to let the DB do some authentication.

 It's a great proposal but seems to be impossible to implement with 
 triggers only solution, isn't it? Is there any kind of hooks on ALTER 
 TABLE ... in PostgreSQL to update changed columns bitmaps when table 
 structure changes?

Column numbers are never reused, so I think it would be stable. But if
you do need to be notified of schema changes, the new event triggers
mechanism may be able to do that.

 In SQL2011 there is only one table with the all data, historical and 
 current. So it's not very convenient to specifiy WHERE condition on 
 system time everywhere and for all tables in the query. By default only 
 the current data is selected with a query like SELECT * FROM table.

If there is some syntax that offers a convenient shorthand for WHERE,
that's fine with me. Or using two tables, one called foo and one called
foo_history, is also fine. But I don't want the DML syntax to introduce
new mechanisms that aren't available without the fancy syntax (though
new DDL arrangements might be fine).

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-08-19 Thread Pavel Stehule

 If there is some syntax that offers a convenient shorthand for WHERE,
 that's fine with me. Or using two tables, one called foo and one called
 foo_history, is also fine. But I don't want the DML syntax to introduce
 new mechanisms that aren't available without the fancy syntax (though
 new DDL arrangements might be fine).


convention foo_history is used often - and it should be significant
break for migration

Regards

Pavel Stehule

 Regards,
 Jeff Davis



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-06-25 Thread Vlad Arkhipov

On 05/31/2012 11:52 AM, Jeff Davis wrote:

On Wed, 2012-05-16 at 23:14 +0200, Miroslav Šimulčík wrote:

Hi all,


as a part of my master's thesis I have created temporal support patch
for PostgreSQL. It enables the creation of special temporal tables
with entries versioning. Modifying operations (UPDATE, DELETE,
TRUNCATE) on these tables don't cause permanent changes to entries,
but create new versions of them. Thus user can easily get to the past
states of the table.


I would be very interested to see this, thank you for working on it.

There are quite a few aspects to a temporal database system, and you are
working on a system-maintained transaction-time historical table, right?
Or are there other aspects to your proposal?

Some general comments:

* I'd very much like to see you make use of Range Types from 9.2; in
particular, TSTZRANGE would be much better than holding two timestamps.
If a standard requires you to display two timestamps in certain
situations, perhaps you could use ranges internally and display the
boundaries as timestamps when needed.
It's not sufficient to store only a period of validity for a row. If two 
transactions started in the same time change the same record, you have a 
problem with TSTZRANGE type because it's normalized to empty interval. 
The other issue is how to handle multiple changes of the same record 
within the transaction. Should they be stored or not?
Also it's necessary to store some kind of operation type that was 
applied to the record (insert/update/delete). For example, there is a 
table with one record with validity period [0, ) and value 'A'.


First way
1. Delete this record in time 1, now there is [0, 1), A in the history 
table.
2. Insert a new record in time 1, now there is [0, 1), A in the history 
table and [1, ), B record in the current data table.


Second way
1. Update this record in time 1, now there is [0, 1), A in the history 
table and [1, ), B record in the current data table.


So you have the same data in the tables but the actions that led to this 
configuration were different and the history has been lost partly.



* There is other useful information that could be recorded, such as the
user who inserted/updated/deleted the record.
I'm not sure that the database user is the proper thing to be stored in 
the history table. Many applications usually connect to a database using 
some virtual user and have their own users/roles tables to handle with 
privileges. There should be some way to substitute the stored user in 
the history table with the application's one. It's also helpful to store 
transaction id that inserted/updated/deleted the record.



* For some purposes, it's very useful to keep track of the columns that
changed. For instance, a query like show me any time a salary was
changed over the last month (or some other rare event) would be very
slow to run if there was not some explicit annotation on the historical
records (e.g. a columns changed bitmap or something).
It's a great proposal but seems to be impossible to implement with 
triggers only solution, isn't it? Is there any kind of hooks on ALTER 
TABLE ... in PostgreSQL to update changed columns bitmaps when table 
structure changes?

* In general, I'm not fond of adorning queries with TRANSACTION TIME AS
OF... kinds of things. Those constructs are redundant with a WHERE
clause (on a range type, you'd use the contains operator). If a
standard requires that, maybe it would be OK to allow such things as
syntactic sugar.
In SQL2011 there is only one table with the all data, historical and 
current. So it's not very convenient to specifiy WHERE condition on 
system time everywhere and for all tables in the query. By default only 
the current data is selected with a query like SELECT * FROM table.

* As Jim mentioned, it might make sense to use something resembling
inheritance so that selecting from the historical table includes the
current data (but with no upper bound for the range).
We have a success experience with inheritance with our trigger-based 
solution. It's completely transparent for the existing applications and 
does not have any impact on performance.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-06-18 Thread Vlad Arkhipov

On 06/15/2012 03:59 PM, Jeff Davis wrote:

On Wed, 2012-06-13 at 23:10 +0200, Miroslav Šimulčík wrote:


I have working patch for postgresql version 9.0.4, but it needs
refactoring before i can submit it, because some parts don't
meet formatting requirements yet. And yes, changes are large, so it
will be better to discuss design first and then deal with code. Do you
insist on compatibility with standard SQL 2011 as Pavel wrote?


Try to work on solving the problem and identify the use cases. I don't
think the standard will cause a major problem, we should be able to make
the relevant parts of your patch match the standard.

That's one reason to work on it as an extension first: we can get a
better sense of the problem space and various use cases without worrying
about violating any standard. Then, as you need specific backend support
(e.g. special syntax), we can take the standards more seriously.

Regards,
Jeff Davis



What's wrong with SPI/timetravel extension for system versioning?
http://www.postgresql.org/docs/9.1/static/contrib-spi.html

We are heavily using system-versioned and application-time period tables 
in our enterprise products (most of them are bi-temporal). However our 
implementation is based on triggers and views and therefore is not very 
convenient to use. There are also some locking issues with foreign keys 
to application-time period tables. It will be great if the new temporal 
SQL features will be included in the Postgresql core with SQL 2011 
syntax support. It is especially important for bi-temporal tables 
because of complex internal logic of UPDATE/DELETE and huge SELECT 
queries for such tables.


Re: [HACKERS] temporal support patch

2012-06-18 Thread Jeff Davis
On Mon, 2012-06-18 at 19:34 +0900, Vlad Arkhipov wrote:
 What's wrong with SPI/timetravel extension for system versioning?
 http://www.postgresql.org/docs/9.1/static/contrib-spi.html
 
 We are heavily using system-versioned and application-time period
 tables in our enterprise products (most of them are bi-temporal).
 However our implementation is based on triggers and views and
 therefore is not very convenient to use. There are also some locking
 issues with foreign keys to application-time period tables. It will be
 great if the new temporal SQL features will be included in the
 Postgresql core with SQL 2011 syntax support. It is especially
 important for bi-temporal tables because of complex internal logic of
 UPDATE/DELETE and huge SELECT queries for such tables.

I've already pointed out some missing features in this thread, but the
big ones in my mind are:

1. It doesn't use 9.2 Range Types, which would help in a lot of ways
(like making the SELECT queries a lot simpler and faster).

2. It's missing a lot of options, like storing the user that modified a
row or the changed columns.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-06-15 Thread Jeff Davis
On Wed, 2012-06-13 at 23:10 +0200, Miroslav Šimulčík wrote:

 I have working patch for postgresql version 9.0.4, but it needs
 refactoring before i can submit it, because some parts don't
 meet formatting requirements yet. And yes, changes are large, so it
 will be better to discuss design first and then deal with code. Do you
 insist on compatibility with standard SQL 2011 as Pavel wrote?
 
Try to work on solving the problem and identify the use cases. I don't
think the standard will cause a major problem, we should be able to make
the relevant parts of your patch match the standard.

That's one reason to work on it as an extension first: we can get a
better sense of the problem space and various use cases without worrying
about violating any standard. Then, as you need specific backend support
(e.g. special syntax), we can take the standards more seriously.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-06-13 Thread Miroslav Šimulčík
There would be no problem to make my solution compatible with SQL 2011, but
the standard is not freely available. Can anybody provide me with this
standard?

2012/5/20 Pavel Stehule pavel.steh...@gmail.com

 Hello

 2012/5/18 Miroslav Šimulčík simulcik.m...@gmail.com:
  Hello.
 
  SQL 2011 standard wasn't available in time I started this project so I
 built
  my implementation on older standards TSQL2 and SQL/Temporal, that were
 only
  available. None of these were accepted by ANSI/ISO commissions however.
 
  There is different syntax in SQL 2011 and it looks like one that IBM DB2
 had
  been using even before this standard were published.
 
  So my implementation differs in syntax, but features are same as stated
 in
  system versioned tables part of slideshow.
 

 I would to see temporal functionality in pg, but only in SQL 2011
 syntax. Using syntax from deprecated proposals has no sense. I am not
 sure so history table concept is best from performance view - it is
 simpler for implementation, but you duplicate all indexes - there will
 be lot of redundant fields in history table. A important query is
 difference in cost for some non trivial query for actual data and same
 query for historic data.

 Regards

 Pavel Stehule

  Regards
  Miroslav Simulcik
 
 
  2012/5/17 Pavel Stehule pavel.steh...@gmail.com
 
  Hello
 
  what is conformance of your solution with temporal extension in ANSI SQL
  2011
 
 
 
 http://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438
 
  Regards
 
  Pavel Stehule
 
  2012/5/16 Miroslav Šimulčík simulcik.m...@gmail.com:
   Hi all,
  
   as a part of my master's thesis I have created temporal support patch
   for
   PostgreSQL. It enables the creation of special temporal tables with
   entries
   versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these
   tables
   don't cause permanent changes to entries, but create new versions of
   them.
   Thus user can easily get to the past states of the table.
  
   Basic information on temporal databases can be found
   on http://en.wikipedia.org/wiki/Temporal_database
  
   In field of temporal databases, there are only proprietary solution
   available. During the analysis I found these:
   - IBM DB2 10 for z/OS
   - Oracle 11g Workspace Manager
   - Teradata Database 13.10
  
   Primary goal of my work was the creation of opensource solution, that
 is
   easy to use and is backward compatible with existing applications,
   so that
   the change of the original tables to temporal ones, does not require
   changes
   to applications that work with them. This patch is built on standard
   SQL/Temporal with some minor modifications inspired by commercial
   temporal
   database systems. Currently it only deals with transaction time
 support.
  
   Here is simple description on how it works:
  
   1. user can create transaction time table using modified CREATE TABLE
   command:
  
   CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;
  
   This command automatically creates all objects required for
   transaction
   time support:
  
 List of relations
 Schema | Name |   Type   |  Owner
+--+--+--
 public | person   | table| tester
 public | person__entry_id_seq | sequence | tester
 public | person_hist  | table| postgres
  
  
 Table
 public.person
   Column   |Type |
Modifiers
  
  
  

 +-+--
 name   | character varying(50)   |
 _entry_id  | bigint  | not null default
   nextval('person__entry_id_seq'::regclass)
 _sys_start | timestamp without time zone | not null default
   clock_timestamp()
 _sys_end   | timestamp without time zone | not null default
   '294276-12-31 23:59:59.99'::timestamp without time zone
Indexes:
person__entry_id_idx btree (_entry_id)
person__sys_start__sys_end_idx btree (_sys_start, _sys_end)
  
  
 Table public.person_hist
   Column   |Type | Modifiers
+-+---
 name   | character varying(50)   |
 _entry_id  | bigint  | not null
 _sys_start | timestamp without time zone | not null
 _sys_end   | timestamp without time zone | not null
Indexes:
person_hist__entry_id_idx btree (_entry_id)
person_hist__sys_start__sys_end_idx btree (_sys_start,
   _sys_end)
  
  
  
  
   Table person stores current versions of entries. 3 additional
   columns
   are added:
   _entry_id - id of entry. It groups together 

Re: [HACKERS] temporal support patch

2012-06-13 Thread Miroslav Šimulčík
2012/5/30 Jim Nasby j...@nasby.net

 On 5/18/12 2:06 AM, Miroslav Šimulčík wrote:

 - no data redundancy - in my extension current versions of entries are
 stored only once in original table (in table_log - entries are inserted to
 both original and log table)


 That's not necessarily a benefit... it makes querying for both history
 *and* current data a lot more complex. Table inheritance might be an
 elegant solution to that, but I doubt you could just bolt that on top of
 what you've created.



Yes, querying for history data is more complex, but i focused on preserving
the performance of current queries. That's the reason why I use separate
table for old versions.

Table inheritance is very good idea and it will not require so much effort
to use it in my solution. Currently, when user queries whole history of
entries, table reference in FROM clause is replaced with subselect, which
access data in both tables. For example when user executes command:

NONSEQUENCED TRANSACTIONTIME SELECT * FROM person;

The actually executed command is:

SELECT * FROM (SELECT * FROM person UNION ALL SELECT * FROM person_hist) as
person

Use of table inheritance can make things simpler and more elegant, but I'm
not sure about how it affect performance. Will it cause gain in performance?


The timestamp fields need to have timezone info. If you change the timezone
 for a connection you will get inconsistent results without it.

 _sys_end should either be NULLable or if it's going to have a magic value
 that magic value should be Infinity:



Good point. I will use timestamp with timezone and  value Infinity
instead of max timestamp value


Re: [HACKERS] temporal support patch

2012-06-13 Thread Miroslav Šimulčík


 * I'd very much like to see you make use of Range Types from 9.2; in
 particular, TSTZRANGE would be much better than holding two timestamps.
 If a standard requires you to display two timestamps in certain
 situations, perhaps you could use ranges internally and display the
 boundaries as timestamps when needed.



I agree, new range types will be ideal for this




 * There is other useful information that could be recorded, such as the
 user who inserted/updated/deleted the record.



Yes I considered addition of user ID and transaction ID columns, because it
can be useful in some cases (for example to find all changes made by
transaction). However it wasn't necessary, so i omitted it. It can be
easily added.




 * For some purposes, it's very useful to keep track of the columns that
 changed. For instance, a query like show me any time a salary was
 changed over the last month (or some other rare event) would be very
 slow to run if there was not some explicit annotation on the historical
 records (e.g. a columns changed bitmap or something).



Another useful feature. I can take a look on it



 * As Jim mentioned, it might make sense to use something resembling
 inheritance so that selecting from the historical table includes the
 current data (but with no upper bound for the range).



See reply to Jim's post.




 * It might make sense to hammer out as many of the details as we can
 with an extension. For instance, exactly what options will be available,
 what data types will be used, what objects will be created, the trigger
 code, etc. Then, it will be more obvious exactly what we need to add
 extra core support for (e.g. if we are going to use some inheritance
 like mechanism), and what we need to add syntax sugar for.


 I recommend that you start posting more detailed designs on
 http://wiki.postgresql.org



In which section of wiki can I post detailed design of my solution?



 If you already have code, feel free to submit it for the next commitfest
 ( http://commitfest.postgresql.org ), but this is a relatively large
 project, so it will most likely take several commitfest cycles.



I have working patch for postgresql version 9.0.4, but it needs refactoring
before i can submit it, because some parts don't
meet formatting requirements yet. And yes, changes are large, so it will be
better to discuss design first and then deal with code. Do you insist on
compatibility with standard SQL 2011 as Pavel wrote?


Re: [HACKERS] temporal support patch

2012-06-13 Thread Merlin Moncure
On Wed, Jun 13, 2012 at 4:10 PM, Miroslav Šimulčík
simulcik.m...@gmail.com wrote:
 I have working patch for postgresql version 9.0.4, but it needs refactoring
 before i can submit it, because some parts don't
 meet formatting requirements yet. And yes, changes are large, so it will be
 better to discuss design first and then deal with code. Do you insist on
 compatibility with standard SQL 2011 as Pavel wrote?

Standards compliance is always going to make things easier in terms of
gaining community acceptance if you're targeting in core adoption.  At
the very least it will remove one barrier although you might be in for
a slog for other reasons.  You may not have known this, but postgres
had a time travel feature waaay back in the day (see:
http://www.postgresql.org/docs/6.3/static/c0503.htm).  It was removed
for performance reasons and the first thing I'm wondering is how your
stuff performs in various scenarios and various other interesting
things. Also, +1 on use of range types  Anyways, thanks for submitting
and good luck!

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-05-30 Thread Jeff Davis
On Wed, 2012-05-16 at 23:14 +0200, Miroslav Šimulčík wrote:
 Hi all,
 
 
 as a part of my master's thesis I have created temporal support patch
 for PostgreSQL. It enables the creation of special temporal tables
 with entries versioning. Modifying operations (UPDATE, DELETE,
 TRUNCATE) on these tables don't cause permanent changes to entries,
 but create new versions of them. Thus user can easily get to the past
 states of the table.
 
I would be very interested to see this, thank you for working on it.

There are quite a few aspects to a temporal database system, and you are
working on a system-maintained transaction-time historical table, right?
Or are there other aspects to your proposal?

Some general comments:

* I'd very much like to see you make use of Range Types from 9.2; in
particular, TSTZRANGE would be much better than holding two timestamps.
If a standard requires you to display two timestamps in certain
situations, perhaps you could use ranges internally and display the
boundaries as timestamps when needed.

* There is other useful information that could be recorded, such as the
user who inserted/updated/deleted the record.

* For some purposes, it's very useful to keep track of the columns that
changed. For instance, a query like show me any time a salary was
changed over the last month (or some other rare event) would be very
slow to run if there was not some explicit annotation on the historical
records (e.g. a columns changed bitmap or something).

* In general, I'm not fond of adorning queries with TRANSACTION TIME AS
OF... kinds of things. Those constructs are redundant with a WHERE
clause (on a range type, you'd use the contains operator). If a
standard requires that, maybe it would be OK to allow such things as
syntactic sugar.

* I do like having special DDL that creates the appropriate objects.
That helps to guide users so they don't have to invent their own
solution with triggers, etc.

* As Jim mentioned, it might make sense to use something resembling
inheritance so that selecting from the historical table includes the
current data (but with no upper bound for the range).

* It might make sense to hammer out as many of the details as we can
with an extension. For instance, exactly what options will be available,
what data types will be used, what objects will be created, the trigger
code, etc. Then, it will be more obvious exactly what we need to add
extra core support for (e.g. if we are going to use some inheritance
like mechanism), and what we need to add syntax sugar for.

I recommend that you start posting more detailed designs on
http://wiki.postgresql.org

If you already have code, feel free to submit it for the next commitfest
( http://commitfest.postgresql.org ), but this is a relatively large
project, so it will most likely take several commitfest cycles.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-05-29 Thread Jim Nasby

On 5/18/12 2:06 AM, Miroslav Šimulčík wrote:

- no data redundancy - in my extension current versions of entries are stored 
only once in original table (in table_log - entries are inserted to both 
original and log table)


That's not necessarily a benefit... it makes querying for both history *and* 
current data a lot more complex. Table inheritance might be an elegant solution 
to that, but I doubt you could just bolt that on top of what you've created.

The timestamp fields need to have timezone info. If you change the timezone for 
a connection you will get inconsistent results without it.

_sys_end should either be NULLable or if it's going to have a magic value that magic 
value should be Infinity:

decibel@workbook.local=# create table t(t timestamptz);
CREATE TABLE
decibel@workbook.local=# insert into t values('infinity');
INSERT 0 1
decibel@workbook.local=# select * from t;
t
--
 infinity
(1 row)

decibel@workbook.local=#
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-05-20 Thread Pavel Stehule
Hello

2012/5/18 Miroslav Šimulčík simulcik.m...@gmail.com:
 Hello.

 SQL 2011 standard wasn't available in time I started this project so I built
 my implementation on older standards TSQL2 and SQL/Temporal, that were only
 available. None of these were accepted by ANSI/ISO commissions however.

 There is different syntax in SQL 2011 and it looks like one that IBM DB2 had
 been using even before this standard were published.

 So my implementation differs in syntax, but features are same as stated in
 system versioned tables part of slideshow.


I would to see temporal functionality in pg, but only in SQL 2011
syntax. Using syntax from deprecated proposals has no sense. I am not
sure so history table concept is best from performance view - it is
simpler for implementation, but you duplicate all indexes - there will
be lot of redundant fields in history table. A important query is
difference in cost for some non trivial query for actual data and same
query for historic data.

Regards

Pavel Stehule

 Regards
 Miroslav Simulcik


 2012/5/17 Pavel Stehule pavel.steh...@gmail.com

 Hello

 what is conformance of your solution with temporal extension in ANSI SQL
 2011


 http://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

 Regards

 Pavel Stehule

 2012/5/16 Miroslav Šimulčík simulcik.m...@gmail.com:
  Hi all,
 
  as a part of my master's thesis I have created temporal support patch
  for
  PostgreSQL. It enables the creation of special temporal tables with
  entries
  versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these
  tables
  don't cause permanent changes to entries, but create new versions of
  them.
  Thus user can easily get to the past states of the table.
 
  Basic information on temporal databases can be found
  on http://en.wikipedia.org/wiki/Temporal_database
 
  In field of temporal databases, there are only proprietary solution
  available. During the analysis I found these:
      - IBM DB2 10 for z/OS
      - Oracle 11g Workspace Manager
      - Teradata Database 13.10
 
  Primary goal of my work was the creation of opensource solution, that is
  easy to use and is backward compatible with existing applications,
  so that
  the change of the original tables to temporal ones, does not require
  changes
  to applications that work with them. This patch is built on standard
  SQL/Temporal with some minor modifications inspired by commercial
  temporal
  database systems. Currently it only deals with transaction time support.
 
  Here is simple description on how it works:
 
  1. user can create transaction time table using modified CREATE TABLE
  command:
 
      CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;
 
      This command automatically creates all objects required for
  transaction
  time support:
 
                    List of relations
    Schema |         Name         |   Type   |  Owner
       +--+--+--
    public | person               | table    | tester
    public | person__entry_id_seq | sequence | tester
    public | person_hist          | table    | postgres
 
 
                                                    Table public.person
      Column   |            Type             |
       Modifiers
 
 
     +-+--
    name       | character varying(50)       |
    _entry_id  | bigint                      | not null default
  nextval('person__entry_id_seq'::regclass)
    _sys_start | timestamp without time zone | not null default
  clock_timestamp()
    _sys_end   | timestamp without time zone | not null default
  '294276-12-31 23:59:59.99'::timestamp without time zone
       Indexes:
       person__entry_id_idx btree (_entry_id)
       person__sys_start__sys_end_idx btree (_sys_start, _sys_end)
 
 
                Table public.person_hist
      Column   |            Type             | Modifiers
       +-+---
    name       | character varying(50)       |
    _entry_id  | bigint                      | not null
    _sys_start | timestamp without time zone | not null
    _sys_end   | timestamp without time zone | not null
       Indexes:
       person_hist__entry_id_idx btree (_entry_id)
       person_hist__sys_start__sys_end_idx btree (_sys_start,
  _sys_end)
 
 
 
 
      Table person stores current versions of entries. 3 additional
  columns
  are added:
          _entry_id - id of entry. It groups together different versions
  of
  entry.
          _sys_start - beginning of the version validity period (version
  creation timestamp).
          _sys_end - end of the version validity period.
 
      Table person_hist stores historical versions of entries. It has the
  same
  structure and indexes as the person table, but without any constraints
  and
  

Re: [HACKERS] temporal support patch

2012-05-18 Thread Miroslav Šimulčík
Hello.

Basically my implementation serve for the same thing as table_log extension  -
entry versioning. It also uses history table and triggers to store old
versions of entries. After quick review of table_log extension, here is
comparison with my solution:

tale_log advantages compared to my solution:
- no columns added to the original table
- more control over created objects

advantages of my solution compared to table_log
- built in syntax
- more user friendly - everything is done automatically
- no data redundancy - in my extension current versions of entries are
stored only once in original table (in table_log - entries are inserted to
both original and log table)
- no speed degradation for INSERT statement - no triggers called on INSERT
- faster triggers - triggers in my implementation are based on referential
integrity triggers and use prepared statements
- original table alterations are automatically done also on history table
- easier selection of data valid in specific time - each row contains start
and end time
- easier restoration to table - TRANSACTIONTIME AS OF expression SELECT *
INTO restoration_table FROM original_table;

Regards
Miroslav Simulcik

2012/5/17 A.M. age...@themactionfaction.com


 On May 16, 2012, at 5:14 PM, Miroslav Šimulčík wrote:

  Hi all,
 
  as a part of my master's thesis I have created temporal support patch
 for PostgreSQL. It enables the creation of special temporal tables with
 entries versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on
 these tables don't cause permanent changes to entries, but create new
 versions of them. Thus user can easily get to the past states of the table.
 
  Basic information on temporal databases can be found on
 http://en.wikipedia.org/wiki/Temporal_database
 

 Hello!

 I currently use the table_log extension:
 http://pgfoundry.org/projects/tablelog/

 Other than the built-in syntax, how does your implementation differ
 feature-wise?

 Cheers,
 M






Re: [HACKERS] temporal support patch

2012-05-18 Thread Albe Laurenz
Miroslav Šimulcík wrote:
 as a part of my master's thesis I have created temporal support patch for 
 PostgreSQL. It enables the
 creation of special temporal tables with entries versioning. Modifying 
 operations (UPDATE, DELETE,
 TRUNCATE) on these tables don't cause permanent changes to entries, but 
 create new versions of them.
 Thus user can easily get to the past states of the table.

Most of that is something you could handle with triggers (not TRUNCATE of 
course).

What does your proposed feature do that a set of triggers wouldn't?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] temporal support patch

2012-05-18 Thread Miroslav Šimulčík
Hello.

SQL 2011 standard wasn't available in time I started this project so I
built my implementation on older standards TSQL2 and SQL/Temporal, that
were only available. None of these were accepted by ANSI/ISO commissions
however.

There is different syntax in SQL 2011 and it looks like one that IBM DB2
had been using even before this standard were published.

So my implementation differs in syntax, but features are same as stated in
system versioned tables part of slideshow.

Regards
Miroslav Simulcik

2012/5/17 Pavel Stehule pavel.steh...@gmail.com

 Hello

 what is conformance of your solution with temporal extension in ANSI SQL
 2011


 http://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

 Regards

 Pavel Stehule

 2012/5/16 Miroslav Šimulčík simulcik.m...@gmail.com:
  Hi all,
 
  as a part of my master's thesis I have created temporal support patch for
  PostgreSQL. It enables the creation of special temporal tables with
 entries
  versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these
 tables
  don't cause permanent changes to entries, but create new versions of
 them.
  Thus user can easily get to the past states of the table.
 
  Basic information on temporal databases can be found
  on http://en.wikipedia.org/wiki/Temporal_database
 
  In field of temporal databases, there are only proprietary solution
  available. During the analysis I found these:
  - IBM DB2 10 for z/OS
  - Oracle 11g Workspace Manager
  - Teradata Database 13.10
 
  Primary goal of my work was the creation of opensource solution, that is
  easy to use and is backward compatible with existing applications,
 so that
  the change of the original tables to temporal ones, does not require
 changes
  to applications that work with them. This patch is built on standard
  SQL/Temporal with some minor modifications inspired by commercial
 temporal
  database systems. Currently it only deals with transaction time support.
 
  Here is simple description on how it works:
 
  1. user can create transaction time table using modified CREATE TABLE
  command:
 
  CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;
 
  This command automatically creates all objects required for
 transaction
  time support:
 
List of relations
Schema | Name |   Type   |  Owner
   +--+--+--
public | person   | table| tester
public | person__entry_id_seq | sequence | tester
public | person_hist  | table| postgres
 
 
Table public.person
  Column   |Type |
   Modifiers
 
 

 +-+--
name   | character varying(50)   |
_entry_id  | bigint  | not null default
  nextval('person__entry_id_seq'::regclass)
_sys_start | timestamp without time zone | not null default
  clock_timestamp()
_sys_end   | timestamp without time zone | not null default
  '294276-12-31 23:59:59.99'::timestamp without time zone
   Indexes:
   person__entry_id_idx btree (_entry_id)
   person__sys_start__sys_end_idx btree (_sys_start, _sys_end)
 
 
Table public.person_hist
  Column   |Type | Modifiers
   +-+---
name   | character varying(50)   |
_entry_id  | bigint  | not null
_sys_start | timestamp without time zone | not null
_sys_end   | timestamp without time zone | not null
   Indexes:
   person_hist__entry_id_idx btree (_entry_id)
   person_hist__sys_start__sys_end_idx btree (_sys_start,
 _sys_end)
 
 
 
 
  Table person stores current versions of entries. 3 additional columns
  are added:
  _entry_id - id of entry. It groups together different versions of
  entry.
  _sys_start - beginning of the version validity period (version
  creation timestamp).
  _sys_end - end of the version validity period.
 
  Table person_hist stores historical versions of entries. It has the
 same
  structure and indexes as the person table, but without any constraints
 and
  default values.
 
  2. another way of creating transaction time table is adding transaction
 time
  support to existing standard table using ALTER command.
 
  CREATE TABLE person(name varchar(50));
  ALTER TABLE person ADD TRANSACTIONTIME;
 
  3. INSERT entry
 
  INSERT INTO person VALUES('Jack');
 
  SELECT *, _entry_id, _sys_start, _sys_end FROM person;
 
   name | _entry_id | _sys_start |   _sys_end
 
 
 --+---++--
   Jack |  

Re: [HACKERS] temporal support patch

2012-05-18 Thread Miroslav Šimulčík
Hi,

yes you are right, storing old versions of entry to history table can be
handled with triggers and my solution also uses triggers to do that.
Advantage of my implementation is that user doesn't need to create all
necessary objects (triggers, history table, begin/end columns ...)
manually. He just need to use AS TRANSACTIONTIME clause in CREATE TABLE
command and everything else is done automatically by backend.

I focused on backward compatibility with existing applications, so that if
versioning is added to original table, application can run correctly
without any changes. For example columns for beginning and end timestamp
are added to original table. This requires modifying of star symbol '*'
expansion in SELECT statement processing, because newly added (internal)
column have to be implicitly hidden. The same for INSERT command without
column list.

Simply said, my implementation makes work with temporal tables easier a
minimizes limitations for existing applications. User can work with
temporal table, just like with any other table and don't need to bother
with triggers, columns, history tables etc. For example if you want add
column to table with versioning, you can use simple ALTER TABLE ADD COLLUMN
and column is automatically added to history table too.

Regards
Miroslav Simulcik

2012/5/18 Albe Laurenz laurenz.a...@wien.gv.at

 Miroslav Šimulcík wrote:
  as a part of my master's thesis I have created temporal support patch
 for PostgreSQL. It enables the
  creation of special temporal tables with entries versioning. Modifying
 operations (UPDATE, DELETE,
  TRUNCATE) on these tables don't cause permanent changes to entries, but
 create new versions of them.
  Thus user can easily get to the past states of the table.

 Most of that is something you could handle with triggers (not TRUNCATE of
 course).

 What does your proposed feature do that a set of triggers wouldn't?

 Yours,
 Laurenz Albe



Re: [HACKERS] temporal support patch

2012-05-18 Thread Jaime Casanova
On Wed, May 16, 2012 at 4:14 PM, Miroslav Šimulčík
simulcik.m...@gmail.com wrote:

 This is only a illustration of main functionality. Later I can create a
 document about the design and implementation details, but first I need to
 know if such temporal features as described here, could be added to future
 versions of PostgreSQL, after meeting all the requirements of a new patch.


how this affect to users *not* using this feature? i mean, it affects
performance? probably not but...

if you already have the patch (i understand it is from what you said)
add it to the first commitfest
(https://commitfest.postgresql.org/action/commitfest_view?id=14) that
will start on june 15. is difficult to know if something we haven't
seen is acceptable or not (please add some description about the way
you did it)

if the patch is still not ready, then start explaining the design you
are going to use...

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] temporal support patch

2012-05-16 Thread Miroslav Šimulčík
Hi all,

as a part of my master's thesis I have created temporal support patch for
PostgreSQL. It enables the creation of special temporal tables with entries
versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these tables
don't cause permanent changes to entries, but create new versions of them.
Thus user can easily get to the past states of the table.

Basic information on temporal databases can be found on
http://en.wikipedia.org/wiki/Temporal_database

In field of temporal databases, there are only proprietary solution
available. During the analysis I found these:
- IBM DB2 10 for z/OS
- Oracle 11g Workspace Manager
- Teradata Database 13.10

Primary goal of my work was the creation of opensource solution, that is
easy to use and is backward compatible with existing applications, so that
the change of the original tables to temporal ones, does not require
changes to applications that work with them. This patch is built on
standard SQL/Temporal with some minor modifications inspired by commercial
temporal database systems. Currently it only deals with transaction time
support.

Here is simple description on how it works:

1. user can create transaction time table using modified CREATE TABLE
command:

CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;

This command automatically creates all objects required for transaction
time support:

  List of relations
  Schema | Name |   Type   |  Owner
 +--+--+--
  public | person   | table| tester
  public | person__entry_id_seq | sequence | tester
  public | person_hist  | table| postgres


  Table public.person
Column   |Type |
   Modifiers

   
+-+--
  name   | character varying(50)   |
  _entry_id  | bigint  | not null default
nextval('person__entry_id_seq'::regclass)
  _sys_start | timestamp without time zone | not null default
clock_timestamp()
  _sys_end   | timestamp without time zone | not null default
'294276-12-31 23:59:59.99'::timestamp without time zone
 Indexes:
 person__entry_id_idx btree (_entry_id)
 person__sys_start__sys_end_idx btree (_sys_start, _sys_end)


  Table public.person_hist
Column   |Type | Modifiers
 +-+---
  name   | character varying(50)   |
  _entry_id  | bigint  | not null
  _sys_start | timestamp without time zone | not null
  _sys_end   | timestamp without time zone | not null
 Indexes:
 person_hist__entry_id_idx btree (_entry_id)
 person_hist__sys_start__sys_end_idx btree (_sys_start, _sys_end)




Table person stores current versions of entries. 3 additional columns
are added:
_entry_id - id of entry. It groups together different versions of
entry.
_sys_start - beginning of the version validity period (version
creation timestamp).
_sys_end - end of the version validity period.

Table person_hist stores historical versions of entries. It has the
same structure and indexes as the person table, but without any constraints
and default values.

2. another way of creating transaction time table is adding transaction
time support to existing standard table using ALTER command.

CREATE TABLE person(name varchar(50));
ALTER TABLE person ADD TRANSACTIONTIME;

3. INSERT entry

INSERT INTO person VALUES('Jack');

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

 name | _entry_id | _sys_start |   _sys_end

--+---++--
 Jack | 1 | 2012-05-16 22:11:39.856916 | 294276-12-31
23:59:59.99

4. UPDATE entry

UPDATE person SET name = 'Tom';

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

 name | _entry_id | _sys_start |   _sys_end

--+---++--
 Tom  | 1 | 2012-05-16 22:11:44.736195 | 294276-12-31
23:59:59.99

SELECT * FROM person_hist;

 name | _entry_id | _sys_start |  _sys_end

--+---++
 Jack | 1 | 2012-05-16 22:11:39.856916 | 2012-05-16
22:11:44.736194

5. DELETE entry

DELETE FROM person;

SELECT *, _entry_id, _sys_start, _sys_end FROM person;

 name | _entry_id | _sys_start | _sys_end
--+---++--

SELECT * FROM person_hist;

 name | _entry_id | _sys_start |  _sys_end


Re: [HACKERS] temporal support patch

2012-05-16 Thread Pavel Stehule
Hello

what is conformance of your solution with temporal extension in ANSI SQL 2011

http://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

Regards

Pavel Stehule

2012/5/16 Miroslav Šimulčík simulcik.m...@gmail.com:
 Hi all,

 as a part of my master's thesis I have created temporal support patch for
 PostgreSQL. It enables the creation of special temporal tables with entries
 versioning. Modifying operations (UPDATE, DELETE, TRUNCATE) on these tables
 don't cause permanent changes to entries, but create new versions of them.
 Thus user can easily get to the past states of the table.

 Basic information on temporal databases can be found
 on http://en.wikipedia.org/wiki/Temporal_database

 In field of temporal databases, there are only proprietary solution
 available. During the analysis I found these:
     - IBM DB2 10 for z/OS
     - Oracle 11g Workspace Manager
     - Teradata Database 13.10

 Primary goal of my work was the creation of opensource solution, that is
 easy to use and is backward compatible with existing applications, so that
 the change of the original tables to temporal ones, does not require changes
 to applications that work with them. This patch is built on standard
 SQL/Temporal with some minor modifications inspired by commercial temporal
 database systems. Currently it only deals with transaction time support.

 Here is simple description on how it works:

 1. user can create transaction time table using modified CREATE TABLE
 command:

     CREATE TABLE person(name varchar(50)) AS TRANSACTIONTIME;

     This command automatically creates all objects required for transaction
 time support:

                   List of relations
   Schema |         Name         |   Type   |  Owner
      +--+--+--
   public | person               | table    | tester
   public | person__entry_id_seq | sequence | tester
   public | person_hist          | table    | postgres


                                                   Table public.person
     Column   |            Type             |
      Modifiers

    +-+--
   name       | character varying(50)       |
   _entry_id  | bigint                      | not null default
 nextval('person__entry_id_seq'::regclass)
   _sys_start | timestamp without time zone | not null default
 clock_timestamp()
   _sys_end   | timestamp without time zone | not null default
 '294276-12-31 23:59:59.99'::timestamp without time zone
      Indexes:
      person__entry_id_idx btree (_entry_id)
      person__sys_start__sys_end_idx btree (_sys_start, _sys_end)


               Table public.person_hist
     Column   |            Type             | Modifiers
      +-+---
   name       | character varying(50)       |
   _entry_id  | bigint                      | not null
   _sys_start | timestamp without time zone | not null
   _sys_end   | timestamp without time zone | not null
      Indexes:
      person_hist__entry_id_idx btree (_entry_id)
      person_hist__sys_start__sys_end_idx btree (_sys_start, _sys_end)




     Table person stores current versions of entries. 3 additional columns
 are added:
         _entry_id - id of entry. It groups together different versions of
 entry.
         _sys_start - beginning of the version validity period (version
 creation timestamp).
         _sys_end - end of the version validity period.

     Table person_hist stores historical versions of entries. It has the same
 structure and indexes as the person table, but without any constraints and
 default values.

 2. another way of creating transaction time table is adding transaction time
 support to existing standard table using ALTER command.

     CREATE TABLE person(name varchar(50));
     ALTER TABLE person ADD TRANSACTIONTIME;

 3. INSERT entry

     INSERT INTO person VALUES('Jack');

     SELECT *, _entry_id, _sys_start, _sys_end FROM person;

      name | _entry_id |         _sys_start         |           _sys_end

 --+---++--
      Jack |         1 | 2012-05-16 22:11:39.856916 | 294276-12-31
 23:59:59.99

 4. UPDATE entry

     UPDATE person SET name = 'Tom';

     SELECT *, _entry_id, _sys_start, _sys_end FROM person;

      name | _entry_id |         _sys_start         |           _sys_end

 --+---++--
      Tom  |         1 | 2012-05-16 22:11:44.736195 | 294276-12-31
 23:59:59.99

     SELECT * FROM person_hist;

      name | _entry_id |         _sys_start         |          _sys_end

 --+---++
      Jack |         1 | 2012-05-16 22:11:39.856916 | 2012-05-16