Ended up running for 28 min, but it did work as expected. On Thu, Oct 2, 2014 at 10:27 AM, Adam Brusselback <adambrusselb...@gmail.com > wrote:
> Testing that now. Initial results are not looking too performant. > I have one single table which had 234575 updates done to it. I am rolling > back 13093 of them. It's been running 20 min now, using 100% of a single > core, and almost 0 disk. No idea how long it'll run at this point. > > This is on an i5 desktop with 16 gigs of ram and an ssd. > > This is a pretty good test though, as it's a real world use case (even if > the data was generated with PGBench). We now know that area needs some > work before it can be used for anything more than a toy database. > > Thanks, > -Adam > > On Thu, Oct 2, 2014 at 7:52 AM, Felix Kunde <felix-ku...@gmx.de> wrote: > >> Hey there >> >> Thanks again for the fix. I was able to merge it into my repo. >> Also thanks for benchmarking audit. Very interesting results. >> I wonder how the recreation of former database states scales when >> processing many deltas. >> Haven’t done a lot of testing in that direction. >> >> I will transfer the code soon to a more public repo on GitHub. As far as >> I see I have to create an organization for that. >> >> Cheers >> Felix >> >> *Gesendet:* Mittwoch, 01. Oktober 2014 um 17:09 Uhr >> >> *Von:* "Adam Brusselback" <adambrusselb...@gmail.com> >> *An:* "Felix Kunde" <felix-ku...@gmx.de> >> *Cc:* "pgsql-general@postgresql.org" <pgsql-general@postgresql.org> >> *Betreff:* Re: [GENERAL] table versioning approach (not auditing) >> I know we're kinda hijacking this thread, so sorry for that. If you'd >> like to do that, i'd be more than happy to use it and push any fixes / >> changes upstream. I don't have much of a preference on the name either, as >> long as it's something that makes sense. >> >> I would consider myself far from an expert though! Either way, more >> people using a single solution is a good thing. >> >> As a side note, I did some benchmarking this morning and wanted to share >> the results: >> pgbench -i -s 140 -U postgres pgbench >> >> pgbench -c 4 -j 4 -T 600 -U postgres pgbench >> no auditing tps: 2854 >> NOTE: Accounts are audited >> auditing tps: 1278 >> >> pgbench -c 2 -j 2 -N -T 300 -U postgres pgbench >> no auditing tps: 2504 >> NOTE: Accounts are audited >> auditing tps: 822 >> >> pgbench -c 2 -j 2 -T 300 -U postgres pgbench >> no auditing tps: 1836 >> NOTE: branches and tellers are audited, accounts are not >> auditing tps: 505 >> >> I'd love to see if there are some easy wins to boost the performance. >> >> On Wed, Oct 1, 2014 at 5:19 AM, Felix Kunde <felix-ku...@gmx.de> wrote: >>> >>> Hey there. Thank you very much for that fix! Thats why I'd like to have >>> a joint development and joint testing. It's way more convincing for users >>> to go for a solution that is tested by some experts than just by a random >>> developer :) >>> >>> I'm open to create a new project and push the code there. Don't care >>> about the name. Then we might figure out which parts are already good, >>> which parts could be improved and where to go next. I think switching to >>> JSONB for example will be easy, as it offers the same functions than JSON >>> afaik. >>> >>> >>> Gesendet: Dienstag, 30. September 2014 um 21:16 Uhr >>> Von: "Adam Brusselback" <adambrusselb...@gmail.com> >>> An: "Felix Kunde" <felix-ku...@gmx.de> >>> Cc: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org> >>> Betreff: Re: [GENERAL] table versioning approach (not auditing) >>> >>> Felix, I'd love to see a single, well maintained project. For example, I >>> just found yours, and gave it a shot today after seeing this post. I found >>> a bug when an update command is issued, but the old and new values are all >>> the same. The trigger will blow up. I've got a fix for that, but if we >>> had one project that more than a handful of people used, stuff like that >>> would be quashed very quickly. >>> >>> I love the design of it by the way. Any idea what it will take to move >>> to JSONB for 9.4? >>> >>> >>> On Tue, Sep 30, 2014 at 7:22 AM, Felix Kunde <felix-ku...@gmx.de> >>> wrote:Hey >>> >>> yes i'm adding an additional key to each of my tables. First i wanted to >>> use the primary key as one column in my audit_log table, but in some of my >>> tables the PK consists of more than one column. Plus it's nice to have one >>> key that is called the same over all tables. >>> >>> To get a former state for one row at date x I need to join the latest >>> delta BEFORE date x with each delta AFTER date x. If I would log complete >>> rows, this joining part would not be neccessary, but as I usually work with >>> spatial databases that have complex geometries and also image files, this >>> strategy is too harddisk consuming. >>> >>> If there are more users following a similar approach, I wonder why we >>> not throw all the good ideas together, to have one solution that is tested, >>> maintained and improved by more developpers. This would be great. >>> >>> Felix >>> >>> >>> Gesendet: Montag, 29. September 2014 um 23:25 Uhr >>> Von: "Abelard Hoffman" <abelardhoff...@gmail.com[ >>> abelardhoff...@gmail.com]> >>> An: "Felix Kunde" <felix-ku...@gmx.de[felix-ku...@gmx.de]> >>> Cc: "pgsql-general@postgresql.org[pgsql-general@postgresql.org]" < >>> pgsql-general@postgresql.org[pgsql-general@postgresql.org]> >>> Betreff: Re: [GENERAL] table versioning approach (not auditing) >>> >>> Thank you Felix, Gavin, and Jonathan for your responses. >>> >>> Felix & Jonathan: both of you mention just storing deltas. But if you do >>> that, how do you associate the delta record with the original row? Where's >>> the PK stored, if it wasn't part of the delta? >>> >>> Felix, thank you very much for the example code. I took a look at your >>> table schemas. I need to study it more, but it looks like the way you're >>> handling the PK, is you're adding a separate synthethic key (audit_id) to >>> each table that's being versioned. And then storing that key along with the >>> delta. >>> >>> So then to find all the versions of a given row, you just need to join >>> the audit row with the schema_name.table_name.audit_id column. Is that >>> right? The only potential drawback there is there's no referential >>> integrity between the audit_log.audit_id and the actual table. >>> >>> I do like that approach very much though, in that it eliminates the need >>> to interrogate the json data in order to perform most queries. >>> >>> AH >>> >>> >>> >>> On Mon, Sep 29, 2014 at 12:26 AM, Felix Kunde <felix-ku...@gmx.de[ >>> felix-ku...@gmx.de]> wrote:Hey >>> >>> i've also tried to implement a database versioning using JSON to log >>> changes in tables. Here it is: >>> https://github.com/fxku/audit[https://github.com/fxku/audit][https://github.com/fxku/audit[https://github.com/fxku/audit]] >>> I've got two versioning tables, one storing information about all >>> transactions that happened and one where i put the JSON logs of row changes >>> of each table. I'm only logging old values and not complete rows. >>> >>> Then I got a function that recreates a database state at a given time >>> into a separate schema - either to VIEWs, MVIEWs or TABLES. This database >>> state could then be indexed in order to work with it. You can also reset >>> the production state to the recreated past state. >>> >>> Unfortunately I've got no time to further work on it at the moment + I >>> have not done tests with many changes in the database so I can't say if the >>> recreation process scales well. On downside I've realised is that using the >>> json_agg function has limits when I've got binary data. It gets too long. >>> So I'm really looking forward using JSONB. >>> >>> There are more plans in my mind. By having a Transaction_Log table it >>> should be possible to revert only certain transactions. I'm also thinking >>> of parallel versioning, e.g. different users are all working with their >>> version of the database and commit their changes to the production state. >>> As I've got a unique history ID for each table and each row, I should be >>> able to map the affected records. >>> >>> Have a look and tell me what you think of it. >>> >>> Cheers >>> Felix >>> >>> >>> Gesendet: Montag, 29. September 2014 um 04:00 Uhr >>> Von: "Abelard Hoffman" <abelardhoff...@gmail.com[ >>> abelardhoff...@gmail.com]> >>> An: "pgsql-general@postgresql.org[pgsql-general@postgresql.org]" < >>> pgsql-general@postgresql.org[pgsql-general@postgresql.org]> >>> Betreff: [GENERAL] table versioning approach (not auditing) >>> >>> Hi. I need to maintain a record of all changes to certain tables so >>> assist in viewing history and reverting changes when necessary (customer >>> service makes an incorrect edit, etc.). >>> >>> I have studied these two audit trigger examples: >>> >>> https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger][https://wiki.postgresql.org/wiki/Audit_trigger[https://wiki.postgresql.org/wiki/Audit_trigger]] >>> https://wiki.postgresql.org/wiki/Audit_trigger_91plus >>> >>> I've also read about two other approaches to versioning: >>> 1. maintain all versions in one table, with a flag to indicate which is >>> the current version >>> 2. have a separate versions table for each real table, and insert into >>> the associated version table whenever an update or insert is done. >>> >>> My current implementation is based on the wiki trigger examples, using a >>> single table, and a json column to record the row changes (rather than >>> hstore). What I like about that, in particular, is I can have a "global," >>> chronological view of all versioned changes very easily. >>> >>> But there are two types of queries I need to run. >>> 1. Find all changes made by a specific user >>> 2. Find all changes related to a specific record >>> >>> #1 is simple to do. The versioning table has a user_id column of who >>> made the change, so I can query on that. >>> >>> #2 is more difficult. I may want to fetch all changes to a group of >>> tables that are all related by foreign keys (e.g., find all changes to >>> "user" record 849, along with any changes to their "articles," "photos," >>> etc.). All of the data is in the json column, of course, but it seems like >>> a pain to try and build a query on the json column that can fetch all those >>> relationships (and if I mess it up, I probably won't generate any errors, >>> since the json is so free-form). >>> >>> So my question is, do you think using the json approach is wrong for >>> this case? Does it seem better to have separate versioning tables >>> associated with each real table? Or another approach? >>> >>> Thanks >>> >>> >>> >>> >>> >>> -- >>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org[ >>> pgsql-general@postgresql.org]) >>> To make changes to your subscription: >>> >>> http://www.postgresql.org/mailpref/pgsql-general[http://www.postgresql.org/mailpref/pgsql-general] >> >> >